Workbench in 5¶
Building models in DataRobot—regardless of the data handling, modeling options, prediction methods, and deployment actions—comes down to the same five basic actions. Review the steps below to get a quick understanding of how to be successful in DataRobot.
Learn more
Learn about the streamlined, iterative workflow process in the fundamentals of Workbench.
Datasets for testing¶
You can use the following hospital readmissions demo datasets to try out DataRobot functionality:
Download training data Download scoring data
These sample datasets are provided by BioMed Research International to study readmissions across 70,000 inpatients with diabetes. The researchers of the study collected this data from the Health Facts database provided by Cerner Corporation, which is a collection of clinical records across providers in the United States. Health Facts allows organizations that use Cerner’s electronic health system to voluntarily make their data available for research purposes. All the data was cleansed of PII in compliance with HIPAA.
1: Create a Use Case¶
From the Workbench directory, click Create Use Case in the upper right:
Provide a name for the use case and click the check mark to accept. You can change this name at any time by opening the Use Case and clicking on the existing name:
2: Work with data¶
Working with data involves importing (or connecting), exploring, and preparing your data. Three steps and your data is ready for modeling.
Add data to your Use Case via local file, the Data Registry, or a data connection.
Not only do data connections minimize data movement, they also allow you to interactively browse, preview, profile, and prepare your data using DataRobot's integrated data preparation capabilities.
Learn more
To learn more about the topics discussed in this section, see:
While a dataset is being registered in Workbench, DataRobot also performs EDA1—analyzing and profiling every feature to detect feature types, automatically transform date-type features, and assess feature quality. Once registration is complete, you can explore the information uncovered while computing EDA1.
Learn more
To learn more about the topics discussed in this section, see:
If you've added data from a data connection, you can use DataRobot's wrangling capabilities which provide a seamless, scalable, and secure way to access and transform data for modeling. In Workbench, "wrangle" is a visual interface for executing data cleaning at the source, leveraging the compute environment and distributed architecture of your data source.
When you've finished wrangling your dataset, you can "push down" your transformations to your data source, generating a new output dataset.
Learn more
To learn more about the topics discussed in this section, see:
3: Build experiments¶
Within a Use Case, build experiments using the data you have added and then start modeling. Each Workbench experiment is a set of parameters (data, targets, and modeling settings) that you can compare to find the optimal models to solve your business problem. First, add a new experiment:
Next, add data to the new experiment by selecting the dataset that you just loaded to the Use Case.
Select a target and start modeling.
Learn more
To learn more about working with experiments, see:
4: Evaluate models¶
DataRobot's Leaderboard shows you all the models built for your experiment, ranked by performance to help with quick model evaluation.
From the Leaderboard, create new feature lists or click a model to access visualizations for further exploration. These tools interpret, explain, and validate what drives a model’s predictions, and can help inform what to do in your next experiment.
You can also generate compliance documentation from the Leaderbaord.
Learn more
To learn more about working with the Leaderboard, see:
5: Make predictions¶
Once you have selected a model, you can make predictions with it to assess model performance before deploying. Select the model from the Leaderboard and then click Model actions > Make predictions.
On the Make Predictions page, upload a Prediction source:
After you upload a prediction source, you can configure the prediction options and compute predictions.
Next steps¶
After you have built a model—and made predictions to test accuracy—register the model and ultimately deploy it.
From here, you can also: