Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Data export

On a deployment's Monitoring > Data export tab, you can download a deployment's stored training data, prediction data, and actuals to compute and monitor custom business or performance metrics on the Custom Metrics tab or outside DataRobot. To export deployment data for custom metrics, verify that the deployment stores prediction data, generate data for a specified time range, and then view or download that data.

To access deployment data export for prediction data, training data, or actuals:

  1. In the deployment from which you want to export stored training data, prediction data, or actuals, click the Monitoring > Data export tab.

    Note

    To access the Data export tab, the deployment must store prediction data. Ensure that you enabled prediction row storage in the challenger settings. The Data export tab doesn't store or export Prediction Explanations, even if they are requested with the predictions.

  2. Configure the following settings to specify the stored training data, prediction data, or actuals you want to export:

    Setting Description
    1 Model Select the deployment's model, current or previous, to export prediction data for.
    2 Range (UTC) Select the start and end dates of the period you want to export prediction data from.
    3 Resolution Select the granularity of the date slider. Select from hourly, daily, weekly, and monthly granularity based on the time range selected. If the time range is longer than 7 days, hourly granularity is not available.
    4 Reset Reset the data export settings to the default.
  3. Depending on the data available in the deployment, you can see the Actuals, Custom Metric Data, Prediction Data, and Training Data panels:

    • Click Generate Actuals to generate actuals for the specified time range.

    • Click Generate Custom Metric Data to generate available custom metric data for the specified time range.

    • Click Generate Training Data to generate training data for the specified time range.

    • Click Generate Prediction Data to generate prediction data for the specified time range.

    Prediction data and actuals considerations

    When generating prediction data or actuals, consider the following:

    • When generating prediction data, you can export up to 200,000 rows per export. If the time range you set exceeds 200,000 rows of prediction data, decrease the range.

    • In the Data Registry, you can have up to 100 prediction export items. If generating prediction data for export would cause the number of prediction export items in the Data Registry to exceed that limit, delete old prediction export Data Registry items.

    • When generating prediction data for time series deployments, two prediction export items are added to the Data Registry. One item is for the prediction data, and the other is for the prediction results. The Data export tab links to the prediction results.

    • When generating actuals, you can export up to 1,000,000 rows per export. If the time range you set exceeds 1,000,000 rows of actuals, decrease the time range.

    • In the Data Registry, you can have up to 100 actuals export items. If generating actuals data for export would cause the number of actuals export items in the Data Registry to exceed that limit, delete old actuals export Data Registry items.

    • Up to 10,000,000 actuals are stored for a deployment; therefore, exporting old actuals can result in an error if no actuals are currently stored for that time period.

    The training data appears in the Training Data panel. Prediction data and actuals appear in the table below the panels, identified by Prediction Data or Actuals in the Type column.

  4. After the prediction data, training data, or actuals are generated:

    • Click the open icon to open the prediction data in the Data Registry.

    • Click the download icon to download the prediction data.

Note

You can also click Use data in Notebook to open a DataRobot notebook with cells for exporting training data, prediction data, and actuals.

Use exported deployment data for custom metrics

To use the exported deployment data to create your own custom metrics, you can implement a script to read from the CSV file containing the exported data and then calculate metrics using the resulting values, including columns automatically generated during the export process.

This example uses the exported prediction data to calculate and plot the change in the time_in_hospital feature over a 30-day period using the DataRobot prediction timestamp (DR_RESERVED_PREDICTION_TIMESTAMP) as the DateFrame index (or row labels). It also uses the exported training data as the plot's baseline:

import pandas as pd
feature_name = "<numeric_feature_name>"
training_df = pd.read_csv("<path_to_training_data_csv>")
baseline = training_df[feature_name].mean()
prediction_df = pd.read_csv("<path_to_prediction_data_csv>")
prediction_df["DR_RESERVED_PREDICTION_TIMESTAMP"] = pd.to_datetime(
    prediction_df["DR_RESERVED_PREDICTION_TIMESTAMP"]
)
predictions = prediction_df.set_index("DR_RESERVED_PREDICTION_TIMESTAMP")["time_in_hospital"]
ax = predictions.rolling('30D').mean().plot()
ax.axhline(y=baseline - 2, color="C1", label="training data baseline")
ax.legend()
ax.figure.savefig("feature_over_time.png")

DataRobot column reference

DataRobot automatically adds the following columns to the prediction data generated for export:

Column Description
DR_RESERVED_PREDICTION_TIMESTAMP Contains the prediction timestamp.
DR_RESERVED_PREDICTION Identifies regression prediction values.
DR_RESERVED_PREDICTION_<Label> Identifies classification prediction values.

Updated April 3, 2024