Deployment prediction and training data export for custom metrics¶
Deployment prediction and training data export is off by default. Contact your DataRobot representative or administrator for information on enabling this feature.
Feature flag: Enable Training and Prediction Data Export for Deployments
Now available as a public preview feature, you can export a deployment's stored training and prediction data—both the scoring data, and the prediction results—to compute and monitor custom business or performance metrics outside DataRobot. To export deployment predictions and training data for custom metrics, make sure your deployment stores prediction data. Then, generate and export prediction or training data, and use that data to create custom metrics outside DataRobot.
Export deployment prediction and training data¶
To export a deployment's stored prediction and training data:
In the top navigation bar, click Deployments.
On the Deployments tab, click on the deployment you want to open and export stored prediction or training data from.
To access the Data Export tab, the deployment must store prediction data. Ensure that you Enable prediction rows storage for challenger analysis in the deployment settings.
In the deployment, click the Data Export tab.
Access or download training data¶
To access or download training data:
Under Training Data, click the open icon to open the training data in the AI Catalog.
Click the download icon to download the training data.
Access or download prediction data¶
To access or download prediction data:
Configure the following settings to specify the stored prediction data you want to export:
Setting Description Model Select the deployment's model, current or previous, to export prediction data for. Range (UTC) Select the start and end dates of the period you want to export prediction data from. Resolution Select the granularity of the date slider. Select from hourly, daily, weekly, and monthly granularity based on the time range selected. If the time range is longer than 7 days, hourly granularity is not available. Reset Reset the data export settings to the default.
Under Prediction Data, click Generate Prediction Data.
Prediction data generation considerations
When generating prediction data, consider the following:
When generating prediction data, you can export up to 200,000 rows. If the time range you set exceeds 200,000 rows of prediction data, decrease the range.
In the AI Catalog, you can have up to 100 prediction export items. If generating prediction data for export would cause the number of prediction export items in the AI Catalog to exceed that limit, delete old prediction export AI Catalog items.
When generating prediction data for time series deployments, two prediction export items are added to the AI Catalog. One item for the prediction data, the other for the prediction results. The Data Export tab links to the prediction results.
The prediction data export appears in the table below.
After the prediction data is generated:
Click the open icon to open the prediction data in the AI Catalog.
Click the download icon to download the prediction data.
Use exported deployment data for custom metrics¶
To use the exported deployment data to create your own custom metrics, you can implement a script to read from the CSV file containing the exported data and then calculate metrics using the resulting values, including columns automatically generated during the export process.
This example uses the exported prediction data to calculate and plot the change in the
time_in_hospital feature over a 30-day period using the DataRobot prediction timestamp (
DR_RESERVED_PREDICTION_TIMESTAMP) as the DateFrame index (or row labels). It also uses the exported training data as the plot's baseline:
import pandas as pd feature_name = "<numeric_feature_name>" training_df = pd.read_csv("<path_to_training_data_csv>") baseline = training_df[feature_name].mean() prediction_df = pd.read_csv("<path_to_prediction_data_csv>") prediction_df["DR_RESERVED_PREDICTION_TIMESTAMP"] = pd.to_datetime( prediction_df["DR_RESERVED_PREDICTION_TIMESTAMP"] ) predictions = prediction_df.set_index("DR_RESERVED_PREDICTION_TIMESTAMP")["time_in_hospital"] ax = predictions.rolling('30D').mean().plot() ax.axhline(y=baseline - 2, color="C1", label="training data baseline") ax.legend() ax.figure.savefig("feature_over_time.png")
DataRobot column reference¶
DataRobot automatically adds the following columns to the prediction data generated for export:
||Contains the prediction timestamp.|
||Identifies regression prediction values.|
||Identifies classification prediction values.|