Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Model Repository

The Repository is a library of modeling blueprints available for a selected project. These blueprints illustrate the algorithms used to build a model, not the model itself. Models listed in the Repository have not necessarily been built yet. The list includes models that could be built in any of the modeling modes. When you create a project in Manual mode and want to select a specific blueprint to run, you access it from the Repository.

When you choose Autopilot as the modeling mode, DataRobot runs a sample of the models that will provide a good balance of accuracy and runtime. Blueprints that offer the possibility of improvement while also potentially increasing in runtime (many deep learning models, for example) are available from the Repository but not run as part of Autopilot.

It is a good practice to run Autopilot, identify the algorithm that performed best on the data, and then run all variants of that algorithm in the Repository. Comprehensive mode runs all models from the Repository at the maximum sample size, which can be quite time consuming.

From the Repository you can:

  • Search to limit the list of models displayed by type.
  • Use Preview to displays a model's blueprint or code.
  • Start a model run with new parameters.
  • Start a batch run sets new parameters and applies them to selected models for a model run.

Search the Repository

To more easily find one of the model types described below, or to sort by model type, use the Search function:

Simply click in the search box and begin typing a model type, model name, or badge name. As you type, the list automatically narrows to those models meeting your search criteria. To return to the complete model listing, remove all characters from the search box.

Model types

The Repository contains:

  • DataRobot models
  • open-source models

All models—custom or DataRobot-produced—are fit under the same cross-validation framework so that you can rank custom models against the rest of the models in the Leaderboard. See the end-to-end model fitting procedures in the model blueprints.

Note

Existing blueprints created (and/or shared) in an earlier release using the deprecated Jupyter notebook functionality are still available and can be found by searching on the “My Task” or “Shared” badge. Note that you cannot create new or edit existing Jupyter user models.

DataRobot models

DataRobot models are built using massively parallel processing to train and evaluate thousands of choices, mostly built on open source algorithms (because open source has some of the best algorithms available). DataRobot searches through millions of possible combinations of algorithms, preprocessing steps, features, transformations and tuning parameters to deliver the best models for your dataset and prediction target. It is this preprocessing and tuning that produces the best models possible. DataRobot models are marked with the DataRobot icon .

List DataRobot models

You can view a list of DataRobot models from the Repository (or the Leaderboard) by using the search term datarobot model:

Open source models

Because DataRobot often adds functionality to the existing implementations in open source, the version run by DataRobot may not match the version available in a stock open-source installation. DataRobot does, however, build some unmodified open-source models during Autopilot.

List open source models

You can view a list of open source models by searching on the string "open source" from the Repository or Leaderboard. DataRobot denotes an open source model with a badge under the model name:

Regression problems

The following models are available in the Repository for regression problems:

  • Spark ML Linear Regression
  • Spark ML Random Forest Regressor
  • H2O GLM Regressor
  • H2O Deep Learning Regressor
  • H2O Gradient Boosted Regressor
  • H2O Random Forest Regressor

The following models may be run as part of Autopilot for regression problems:

  • Spark ML Random Forest Regressor
  • H2O Random Forest Regressor

Binary classification problems

The following models are available in the Repository for binary classification problems:

  • Spark ML Logistic Regression
  • Spark ML Random Forest Classifier
  • H2O GLM Classifier
  • H2O Deep Learning Classifier
  • H2O Gradient Boosted Classifier
  • H2O Random Forest Classifier

The following models may be run as part of Autopilot for binary classification problems:

  • Spark ML Random Forest Classifier
  • H2O Random Forest Classifier

Note

These models can only be run on projects created for DataRobot 4.0 and later. They are not backwards compatible.

Create a new model

To create a new model from the Repository:

  1. Select the model to run by either marking the check box next to the model name or selecting Add from the corresponding dropdown:

  2. Once selected, modify one or more of the fields in the now-enabled dialog box:

    Element Description
    Feature list From the dropdown select a new feature list. The options include the default lists and any lists you created.
    Sample Size Modify the sample size, making it either larger or smaller than the sample size that Autopilot ran by default. Be sure that you do not enter a sample size that full Autopilot ran by default.
    CV Runs Set the number of folds used in cross-validation.
  3. After verifying the parameter settings, click Run Task(s) to launch the new model run.

Launch a batch run

The Repository batch run capability allows you to set model run parameters and apply them to selected, individual models. To launch a batch run, select the model(s) to run in batch by either clicking in the box next to the model name or selecting all by clicking next to Blueprint Name & Description:

To deselect all selected models, click the minus sign (-) next to Blueprint Name & Description.

If you have already built any of the models in the batch using the same sample size and feature list, you must make a change to at least one of the parameters (described in the run option). This is not required for batches containing all new models. Click Run Task(s) to start the build.

Use the menu to preview a model—either the blueprint or, for open-source models, the code.

Use the Add function to select the model and add it to the task list to run when Run Task is clicked.


Updated October 26, 2021
Back to top