Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Add training data to a custom model

To enable feature drift tracking for a model deployment, you must add training data. To do this, assign training data to a model version. The method for providing training and holdout datasets for unstructured custom inference models requires you to upload the training and holdout datasets separately. Additionally, these datasets cannot include a partition column.

File size warning

The file size limit for custom model training data uploaded to DataRobot is 1.5GB.

To assign training data to a custom model version:

  1. In Model Registry > Custom Model Workshop, in the Models list, select the model you want to add training data to.

  2. On the Assemble tab, next to Datasets:

    • If the model version doesn't have training data assigned, click Assign:

    • If the model version does have training data assigned, click the edit icon , and in the Change Training Data dialog box, click the delete icon to remove the existing training data.

  3. In the Add Training Data (or Change Training Data) dialog box, click and drag a training dataset file into the Training Data box, or click Choose file and do either of the following:

    • Click Local file, select a file from your local storage, and then click Open.

    • Click AI Catalog, select a training dataset you previously uploaded to DataRobot, and click Use this dataset.

    Include features required for scoring

    The columns in a custom model's training data indicate which features are included in scoring requests to the deployed custom model; therefore, once training data is available, any features not included in the training dataset aren't sent to the model. Available as a preview feature, when you assemble a custom model in the NextGen experience, you can disable this behavior using the Column filtering setting.

  4. (Optional) Specify the column name containing partitioning info for your data (based on training/validation/holdout partitioning). If you plan to deploy the custom model and monitor its data drift and accuracy, specify the holdout partition in the column to establish an accuracy baseline.

    Specify partition column

    You can track data drift and accuracy without specifying a partition column; however, in that scenario, DataRobot won't have baseline values. The selected partition column should only include the values T, V, or H.

  5. When the upload is complete, click Add Training Data.

    Training data assignment error

    If the training data assignment fails, an error message appears in the new custom model version under Datasets. While this error is active, you can't create a model package to deploy the affected version. To resolve the error and deploy the model package, reassign training data to create a new version, or create a new version and then assign training data.


Updated April 23, 2024