Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Model building walkthrough

This walkthrough shows you how to use DataRobot to identify at-risk patients, reduce readmission rates, maximize care, and minimize costs. You can learn more about the use case here. You will:

  1. Wrangle data.
  2. Build models.
  3. Evaluate performance.
Watch the full video here

Assets for download

To follow this walkthrough, download and then unzip the ZIP file. Inside you will find a TXT file, a CSV file, and another ZIP file.

Download training data Download scoring data

1. Preview the Hospital Readmission dataset

From the Data tab within your Use Case, you can view all associated datasets. Click the dataset to view its features:

Explore the dataset’s feature structure and values.

Read more: Working with data

2. Wrangle data

Click Start > Start wrangling to pull a random sample of data from the data source and begin transformation operations.

Read more: Wrangle data

3. Build a recipe

Click Add operation to build a wrangling "recipe." Each new operation updates the live sample to reflect the transformation. Note that if you wrangle your training dataset, you will want to apply the same operations to your scoring dataset to ensure you have the same columns.

Read more: Add operations

4. Compute a new feature

The recipe panel lists a variety of possible wrangling operations. Click Compute new feature to create a new output feature—perhaps better representing your business problem— from existing dataset features.

The f(x) feature configuration window is where you add functions and subqueries that define the new feature. Enter the name and expression listed below and click Add to recipe when done. The transformation converts the age range into a single integer.

New feature name: convert_age_range_to_integer

Expression: to_number(REGEXP_SUBSTR("age", '\\[(\\d+)-\\d+\\)', 1, 1, 'e'))

Read more: Compute a new feature

5. Prepare for publishing

When you are finished adding operations, confirm from the live sample that the applied operations are ready for publishing. Click Publish recipe to configure the final publishing settings for the output dataset.

Set the criteria for the final output dataset, such as the name and if enabled, specifics of automatic downsampling. Click Publish to apply the recipe to the source, creating a new output dataset, registering it in the Data Registry, and finally, adding it to your Use Case.

Read more: Publish a recipe

6. Explore the new dataset

The transformed, published dataset, identifiable by the wrangling time stamp, has been added to the Use Case’s Data tab. Click the dataset to see the final feature set, including the new, wrangled feature, and explore feature insights.

If the dataset needs further modification, you can choose to keep wrangling. Otherwise, from the new output dataset, click Start > Modeling to set up a new experiment.

Read more: View data insights

7. Create an experiment

After DataRobot prepares the dataset, enter the name of the column in the dataset that you would like to make predictions for (this is the target). For this Use Case, enter the target feature name Readmitted. DataRobot presents the target feature’s distribution in a histogram. The right panel summarizes the experiment settings. The list of features shown reflects the selected feature list.

Read more: Select a target

8. Apply optional settings

Click Next to further refine your experiment.

DataRobot sets default partitioning and validation based on your data. However, changing experiment parameters is a good way to iterate on a Use Case. Notice the experiment summary information in the right panel. Click Start modeling to launch Autopilot.

Read more: Customize settings

9. Start modeling

Once modeling begins, Workbench begins to construct a model Leaderboard. Ultimately, DataRobot will select and retrain the most accurate model and mark it as prepared for deployment. While model building progresses, click on any completed model and familiarize yourself with the insights available for model evaluation. The Overview page displays available insights for the model, which differ depending on the experiment type.

Click Feature Impact, and compute if prompted, to visualize which features are driving model decisions.

Read more: Evaluate experiments

10. View the modeling pipeline

Now click Blueprint to view the pre- and post-processing steps that go into building a model.

Read more: Blueprints

Next steps

When you are done investigating, you can:

From Model actions you can access a variety of next-steps for your model.

Action Description Read more
Register model Create versioned deployment-ready model packages. Operate and govern walkthrough
Make predictions Make one-time predictions on new data, registered data, or training data to validate Leaderboard models. Make predictions from Workbench
Create app Use No-Code AI App templates to build applications, using a no-code interface, that enable core DataRobot services and are shareable with other users, whether or not they have a DataRobot license. Create an application
Generate compliance report Compile and download model development documentation that can be used for regulatory validation. Blue italic text provides guidance and instruction; black text indicates automatically generated model compliance text—preprocessing, performance, impact, task-specific, and general model information. Also available from Registry
Delete model Permanently remove the selected model from the Use Case (and the associated Leaderboard). N/A

Register and monitor deployed models in the operate and govern walkthrough.


Updated October 25, 2024