Make predictions in Data Prep¶
When you have data that needs to be scored against a deployed Machine Learning (ML) model in DataRobot, the Data Prep Predict tool is how you generate the score.
Work with the Predict tool¶
To access the Predict tool, click the DataRobot icon in the Tools bar and select predict:
To generate the score, provide your DataRobot API token which is used to retrieve a list of your DataRobot deployments.
To obtain your token, navigate to User Settings > Developer Tools > API Keys.
Next, select the deployment. Your data is scored against the model in this deployment. If the model used for scoring is a Time Series model, you must indicate this by checking the Time Series Model checkbox. Then, in the Options tab, specify the Forecast Point and, optionally, the Series Id. See Options for details.
Deployments for custom models are not currently supported.
By default, the new column for the prediction score is created as "Target” in the dataset. To change this name, click the Options tab and provide a different name in the Prediction Column field.
After you select the deployment, the prediction runs. The new column is created and provides the prediction score. In addition, the "Target Prediction Value" column is also generated to provide the associated prediction value for each score. For multiclass predictions, the prediction values are returned per classification. For example, if classifying images into “apple”, “orange” or “pear”, then three additional columns are returned—one value for each corresponding score.
Examples of use case prediction values¶
Predict the probability that a hospital patient may be readmitted after discharge. The prediction column will contain a binary value of 1 or 0 to indicate if the patient is likely to be readmitted or not readmitted.
Classify a set of images into one of three fruits: oranges, pears, or apples. The prediction column will contain one of three values: orange, pear, apple.
Forecast sales based on forecast dates. The prediction column in this case will contain the sales dollar amount.
For binary and time series prediction deployments, the Options tab provides additional options. See Options for details.
For Times Series predictions, you must also provide the forecast point, which is the point you are making a prediction from—a relative time “if it was now…” DataRobot trains models using all potential forecast points in the training data. In production, it is typically the most recent time.
The format of this date must be ISO 2014-08-12T00:00:00Z.
Optionally, if your dataset has multiseries data, for example a dataset that contains multiple time series to forecast the sales for multiple stores, then you can specify a column as a Series Id to group the data and return the predictions separately for each group.
For binary predictions, the Options tab provides prediction explanations that help you to understand why a prediction was returned, for example, “Why did this patient score a 1 for possibility of readmission?" or “Why was this image identified as an apple?"
When Explanations is enabled, five new columns are generated per explanation in the project:
feature: The name of the feature contributing to the prediction.
feature value: The value the feature took on for the row.
strength: The amount this feature's value affected the prediction.
qualitative: A human-readable description of how strongly the feature affected the prediction. For example: ++++; -; +
label: Describes what output was driven by this prediction explanation. For regression projects, it is the name of the target feature. For classification projects, it is the class whose probability, if increased, would correspond to a positive strength of this prediction explanation.
Additionally, Low and High Threshold values can be set so that explanations are only generated for scores outside of the threshold.
See Prediction Explanations for complete details on values returned for predictions.