Get started > Walkthroughs > Predictive model building

Model building walkthrough¶

This walkthrough shows you how to use DataRobot to identify at-risk patients, reduce readmission rates, maximize care, and minimize costs. You can learn more about the use case here. You will:

Wrangle data.
Build models.
Evaluate performance.

Watch the full video here

Assets for download¶

To follow this walkthrough, download and then unzip the ZIP file. Inside you will find a TXT file, a CSV file, and another ZIP file.

Download training data Download scoring data

1. Preview the Hospital Readmission dataset¶

From the Data tab within your Use Case, you can view all associated datasets. Click the dataset to view its features:

Explore the dataset’s feature structure and values.

2. Wrangle data¶

Click Start > Start wrangling to pull a random sample of data from the data source and begin transformation operations.

3. Build a recipe¶

Click Add operation to build a wrangling "recipe." Each new operation updates the live sample to reflect the transformation. Note that if you wrangle your training dataset, you will want to apply the same operations to your scoring dataset to ensure you have the same columns.

4. Compute a new feature¶

The recipe panel lists a variety of possible wrangling operations. Click Compute new feature to create a new output feature—perhaps better representing your business problem— from existing dataset features.

The f(x) feature configuration window is where you add functions and subqueries that define the new feature. Enter the name and expression listed below and click Add to recipe when done. The transformation converts the age range into a single integer.

New feature name: convert_age_range_to_integer

Expression: to_number(REGEXP_SUBSTR("age", '\\[(\\d+)-\\d+\\)', 1, 1, 'e'))

5. Prepare for publishing¶

When you are finished adding operations, confirm from the live sample that the applied operations are ready for publishing. Click Publish recipe to configure the final publishing settings for the output dataset.

Set the criteria for the final output dataset, such as the name and if enabled, specifics of automatic downsampling. Click Publish to apply the recipe to the source, creating a new output dataset, registering it in the Data Registry, and finally, adding it to your Use Case.

6. Explore the new dataset¶

The transformed, published dataset, identifiable by the wrangling time stamp, has been added to the Use Case’s Data tab. Click the dataset to see the final feature set, including the new, wrangled feature, and explore feature insights.

If the dataset needs further modification, you can choose to keep wrangling. Otherwise, from the new output dataset, click Start > Modeling to set up a new experiment.

7. Create an experiment¶

After DataRobot prepares the dataset, enter the name of the column in the dataset that you would like to make predictions for (this is the target). For this Use Case, enter the target feature name Readmitted. DataRobot presents the target feature’s distribution in a histogram. The right panel summarizes the experiment settings. The list of features shown reflects the selected feature list.

8. Apply optional settings¶

Click Next to further refine your experiment.

DataRobot sets default partitioning and validation based on your data. However, changing experiment parameters is a good way to iterate on a Use Case. Notice the experiment summary information in the right panel. Click Start modeling to launch Autopilot.

9. Start modeling¶

Once modeling begins, Workbench begins to construct a model Leaderboard. Ultimately, DataRobot will select and retrain the most accurate model and mark it as prepared for deployment. While model building progresses, click on any completed model and familiarize yourself with the insights available for model evaluation. The Overview page displays available insights for the model, which differ depending on the experiment type.

Click Feature Impact, and compute if prompted, to visualize which features are driving model decisions.

10. View the modeling pipeline¶

Now click Blueprint to view the pre- and post-processing steps that go into building a model.

Next steps¶

When you are done investigating, you can:

Take action from the LeaderboardTry the operate and govern walkthrough

From Model actions you can access a variety of next-steps for your model.

Action	Description	Read more
Register model	Create versioned deployment-ready model packages.	Operate and govern walkthrough
Make predictions	Make one-time predictions on new data, registered data, or training data to validate Leaderboard models.	Make predictions from Workbench
Create app	Use No-Code AI App templates to build applications, using a no-code interface, that enable core DataRobot services and are shareable with other users, whether or not they have a DataRobot license.	Create an application
Generate compliance report	Compile and download model development documentation that can be used for regulatory validation. Blue italic text provides guidance and instruction; black text indicates automatically generated model compliance text—preprocessing, performance, impact, task-specific, and general model information.	Also available from Registry
Delete model	Permanently remove the selected model from the Use Case (and the associated Leaderboard).	N/A

Register and monitor deployed models in the operate and govern walkthrough.

Updated October 25, 2024

Was this page helpful?

Great! Let us know what you found helpful.

What can we do to improve the content?

Thanks for your feedback!