You can add data about a deployment and configure monitoring, notifications, and challenger behavior using the Settings tab.
Use the Data tab¶
The Data tab allows you to add learning, inference, and outcome data to your deployment that you did not initially provide when creating it. The additional data and information can enable more deployment capabilities. This tab is identical to the interface used when you add a new deployment.
If you edit the information for any fields in the Data tab, be sure to click Save Changes when finished editing.
Data cannot be altered for a deployment once it is added.
The Model section details information about the model being used to make predictions for your deployment. DataRobot uses the files and information from the deployment to complete these fields, so they are grayed out and not editable.
|Model||The name of your model.|
|Target||The dataset column name the model will predict on.|
|Prediction type||The type of prediction the model is making, for example, Regression, Classification, Multiclass, Anomaly Detection, Clustering, etc.|
The Fairness section allows you to define Bias and Fairness settings for your deployment to identify any biases in the model's predictive behavior. If fairness settings are defined prior to deploying a model, the fields are automatically populated. For additional information, see the section on defining fairness tests.
|Protected features||The dataset columns to measure fairness of model predictions against; must be categorical.|
|Primary fairness metric||The statistical measure of parity constraints used to assess fairness.|
|Favorable target outcome||The outcome value perceived as favorable for the protected class relative to the target.|
|Fairness threshold||The fairness threshold helps measure if a model performs within appropriate fairness bounds for each protected class.|
This Inference section provides details about your deployment's inference (also known as scoring) data—the data that contains prediction requests and results from the model.
|Inference data||DataRobot stores a deployment's inference data when a deployment is created. It cannot be uploaded separately.|
|Prediction timestamp||Determines the method for time-stamping prediction rows. Use the time of the prediction request or use a date/time feature (e.g., forecast date) provided with prediction data to determine the timestamp. Forecast date time-stamping is set automatically for time series deployments. It allows for a common time axis to be used between training data and the basis of data drift and accuracy statistics. This setting cannot be changed after the deployment is created and predictions are made.|
|Prediction environment||Environment where predictions are generated. Prediction environments allow you to establish access controls and approval workflows.|
|Association ID||The column name that contains the association ID in the prediction dataset for your model. Association IDs are required for setting up accuracy tracking in a deployment. The association ID functions as an identifier for your prediction dataset so you can later match up outcome data (also called "actuals") with those predictions. Note that the Create deployment button is inactive until you enter an association ID or turn off this toggle.|
|Enable target monitoring||Configures DataRobot to track target drift in a deployment. Target monitoring is required for accuracy monitoring.|
|Enable feature drift tracking||Configures DataRobot to track feature drift in a deployment. Training data is required for feature drift tracking.|
|Enable prediction rows storage for challenger analysis||Enables the use of challenger models, which allow you to compare models post-deployment and replace the champion model if necessary.|
|Track attributes for segmented analysis of training data and predictions||When enabled, allows DataRobot to monitor deployment predictions by segments, for example by categorical features.|
Data drift tracking¶
When deploying a model, there is a chance that the dataset used for training and validation differs from the prediction data.
How does DataRobot track drift?
For data drift, DataRobot tracks:
Target drift: DataRobot stores statistics about predictions to monitor how the distribution and values of the target change over time. As a baseline for comparing target distributions, DataRobot uses the distribution of predictions on the holdout.
Feature drift: DataRobot stores statistics about predictions to monitor how distributions and values of features change over time. As a baseline for comparing distributions of features:
For training datasets larger than 500 MB, DataRobot uses the distribution of a random sample of the training data.
For training datasets smaller than 500 MB, DataRobot uses the distribution of 100% of the training data.
DataRobot monitors both target and feature drift information by default and displays results in the Data Drift dashboard. Use the Enable target monitoring and Enable feature drift tracking toggles to turn off tracking if, for example, you have sensitive data that should not be monitored in the deployment.
The Enable target monitoring setting is required to enable accuracy monitoring.
You can customize how data drift is monitored. See the data drift page for more information on customizing data drift status for deployments.
Data drift tracking is only available for deployments using deployment-aware prediction API routes (i.e.,
Prediction row storage for challenger analysis¶
DataRobot can securely store prediction request data at the row level for deployments (not supported for external model deployments). This setting must be enabled for any deployment using the Challengers tab. In addition to enabling challenger analysis, access to stored prediction request rows enables you to thoroughly audit the predictions and use that data to troubleshoot operational issues. For instance, you can examine the data to understand an anomalous prediction result or why a dataset was malformed.
Contact your DataRobot representative to learn more about data security, privacy, and retention measures or to discuss prediction auditing needs.
Enable prediction request row collection either:
- From a deployment's Settings > Data tab, under the Inference header
- During deployment creation
Toggle Enable prediction rows storage for challenger analysis to on. During deployment creation, this toggle appears under the Inference Data section.
Once enabled, prediction requests made for the deployment are collected by DataRobot. Prediction explanations are not stored.
Note that prediction requests are collected only if the prediction data is in a valid data format interpretable by DataRobot, such as CSV or JSON. Failed prediction requests with a valid data format are also collected (i.e., missing input features).
The Learning section provides details about the deployment's learning (also known as training) data—the data used to train and build a model. Upload retraining data in the Learning section, as well.
Use the Actuals section to upload a file with actuals to monitor accuracy by matching the model's predictions with actual values. Actuals are required to enable the Accuracy tab. Reference the documentation for setting up accuracy for specific information about this section.
Set prediction intervals for time series deployments¶
Time series users have the additional capability to add a prediction interval to the prediction response of deployed models. When enabled, prediction intervals will be added to the response of any prediction call associated with the deployment.
To enable prediction intervals, navigate to the Settings > Prediction Intervals tab.
Toggle Enable prediction intervals to on and select an interval (read more about prediction intervals here).
When you have set an interval, you can copy the ID from the deployment URL (or from the snippet in the Prediction API tab) and check that the deployment was added to the database as well as add the ID to the deployment prediction script. You can compare the results from your API output with prediction preview in the UI to verify results.
For more information on working with prediction intervals via the API, access the API documentation by signing in to DataRobot, clicking the question mark on the upper right, and selecting API Documentation. In the API documentation, select Time Series Projects > Prediction Intervals.