# Configure predictions settings

> Configure predictions settings - The Predictions Settings tab provides details about your
> deployment's inference (also known as scoring) data.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-05-06T18:17:10.038229+00:00` (UTC).

## Primary page

- [Configure predictions settings](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-settings/nxt-predictions-settings.html): Full documentation for this topic (HTML).

## Sections on this page

- [Set prediction autoscaling settings for DataRobot serverless deployments](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-settings/nxt-predictions-settings.html#set-prediction-autoscaling-settings-for-datarobot-serverless-deployments): In-page section heading.
- [Change secondary datasets for Feature Discovery](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-settings/nxt-predictions-settings.html#change-secondary-datasets-for-feature-discovery): In-page section heading.

## Related documentation

- [NextGen UI documentation](https://docs.datarobot.com/en/docs/workbench/index.html): Linked from this page.
- [Console](https://docs.datarobot.com/en/docs/workbench/nxt-console/index.html): Linked from this page.
- [Deployment settings](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-settings/index.html): Linked from this page.
- [Prediction environments](https://docs.datarobot.com/en/docs/classic-ui/mlops/deployment/prediction-env/index.html): Linked from this page.
- [batch-enabled deployments](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-monitoring/nxt-batch-monitoring.html): Linked from this page.
- [Feature Discovery](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/dataprep/perform-safer.html): Linked from this page.

## Documentation content

On a deployment's Settings > Predictions tab, you can view details about your deployment's inference (also known as scoring) data—the data containing prediction requests and results from the model.

On the Predictions Settings page, you can access the following information:

| Field | Description |
| --- | --- |
| Prediction environment | Displays the environment where predictions are generated. Prediction environments allow you to establish access controls and approval workflows. |
| Prediction timestamp | Defines the method used for time-stamping prediction rows. To define the timestamp, select one of the following:Use time of prediction request: The timestamp of the prediction request.Use value from date/time feature: A date/time feature (e.g., forecast date) provided with the prediction data and defined by the Feature and Date format settings. Forecast date time-stamping is set automatically for time series deployments. It allows for a common time axis to be used between training data and the basis of data drift and accuracy statistics. The Feature and Date format settings cannot be changed after predictions are made. |
| Batch monitoring | Enables viewing monitoring statistics organized by batch, instead of by time, with batch-enabled deployments. |

> [!NOTE] Time series deployment date/time format considerations
> For time series deployments using the date/time format `%Y-%m-%d %H:%M:%S.%f`, DataRobot automatically populates a `v2` in front of the timestamp format. Date/time values submitted in prediction data should not include this `v2` prefix. Other timestamp formats are not affected.

## Set prediction autoscaling settings for DataRobot serverless deployments

Autoscaling automatically adjusts the number of replicas in your deployment based on incoming traffic. During high-traffic periods, it adds replicas to maintain performance. During low-traffic periods, it removes replicas to reduce costs. This eliminates the need for manual scaling while ensuring your deployment can handle varying loads efficiently.

**Basic autoscaling:**
To configure autoscaling, modify the following settings. Note that for DataRobot models, DataRobot performs autoscaling based on CPU usage at a 40% threshold.:

[https://docs.datarobot.com/en/docs/images/nxt-real-time-pred-configure.png](https://docs.datarobot.com/en/docs/images/nxt-real-time-pred-configure.png)

Field
Description
Minimum compute instances
(Premium feature)
Set the minimum compute instances for the model deployment. If your organization doesn't have access to "always-on" predictions, this setting is set to
0
and isn't configurable. With the minimum compute instances set to
0
, the inference server will be stopped after an inactivity period of 7 days. The minimum and maximum compute instances depend on the model type. For more information, see the
compute instance configurations
note.
Maximum compute instances
Set the maximum compute instances for the model deployment to a value above the current configured minimum. To limit compute resource usage, set the maximum value equal to the minimum. The minimum and maximum compute instances depend on the model type. For more information, see the
compute instance configurations
note.

**Advanced autoscaling (custom models):**
To configure autoscaling, select the metric that will trigger scaling:

CPU utilization: Set a threshold for the average CPU usage across active replicas. When CPU usage exceeds this threshold, the system automatically adds replicas to provide more processing power.
HTTP request concurrency: Set a threshold for the number of simultaneous requests being processed. For example, with a threshold of 5, the system will add replicas when it detects 5 concurrent requests being handled.

When your chosen threshold is exceeded, the system calculates how many additional replicas are needed to handle the current load. It continuously monitors the selected metric and adjusts the replica count up or down to maintain optimal performance while minimizing resource usage.

Review the settings for CPU utilization below.

[https://docs.datarobot.com/en/docs/images/nxt-real-time-pred-cpu.png](https://docs.datarobot.com/en/docs/images/nxt-real-time-pred-cpu.png)

Field
Description
CPU utilization (%)
Set the target CPU usage percentage that triggers scaling. When CPU utilization reaches this threshold, the system adds more replicas.
Cool down period (minutes)
Set the wait time after a scale-down event before another scale-down can occur. This prevents rapid scaling fluctuations when metrics are unstable.
Minimum compute instances
(Premium feature)
Set the minimum compute instances for the model deployment. If your organization doesn't have access to "always-on" predictions, this setting is set to
0
and isn't configurable. With the minimum compute instances set to
0
, the inference server will be stopped after an inactivity period of 7 days. The minimum and maximum compute instances depend on the model type. For more information, see the
compute instance configurations
note.
Maximum compute instances
Set the maximum compute instances for the model deployment to a value above the current configured minimum. To limit compute resource usage, set the maximum value equal to the minimum. The minimum and maximum compute instances depend on the model type. For more information, see the
compute instance configurations
note.

Review the settings for HTTP request concurrency below.

[https://docs.datarobot.com/en/docs/images/nxt-real-time-pred-http.png](https://docs.datarobot.com/en/docs/images/nxt-real-time-pred-http.png)

Field
Description
HTTP request concurrency
Set the number of simultaneous requests required to trigger scaling. When concurrent requests reach this threshold, the system adds more replicas.
Cool down period (minutes)
Set the wait time after a scale-down event before another scale-down can occur. This prevents rapid scaling fluctuations when metrics are unstable.
Minimum compute instances
(Premium feature)
Set the minimum compute instances for the model deployment. If your organization doesn't have access to "always-on" predictions, this setting is set to
0
and isn't configurable. With the minimum compute instances set to
0
, the inference server will be stopped after an inactivity period of 7 days. The minimum and maximum compute instances depend on the model type. For more information, see the
compute instance configurations
note.
Maximum compute instances
Set the maximum compute instances for the model deployment to a value above the current configured minimum. To limit compute resource usage, set the maximum value equal to the minimum. The minimum and maximum compute instances depend on the model type. For more information, see the
compute instance configurations
note.


> [!NOTE] Premium feature: Always-on predictions
> Always-on predictions are a premium feature. Deployment autoscaling management is required to configure the minimum compute instances setting. Contact your DataRobot representative or administrator for information on enabling the feature.
> 
> Feature flag: Enable Deployment Auto-Scaling Management

> [!NOTE] Compute instance configurations
> For DataRobot model deployments:
> 
> The default minimum is 0 and default maximum is 3.
> The minimum and maximum limits are taken from the organization's
> max_compute_serverless_prediction_api
> setting.
> 
> For custom model deployments:
> 
> The default minimum is 0 and default maximum is 1.
> The minimum and maximum limits are taken from the organization's
> max_custom_model_replicas_per_deployment
> setting.
> The minimum is always greater than 1 when running on GPUs (for LLMs).
> 
> Additionally, for high availability scenarios:
> 
> The minimum compute instances setting
> must
> be greater than or equal to 2.
> This requires business critical or consumption-based pricing.

## Change secondary datasets for Feature Discovery

[Feature Discovery](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/dataprep/perform-safer.html) identifies and generates new features from multiple datasets so that you no longer need to perform manual feature engineering to consolidate multiple datasets into one. This process is based on relationships between datasets and the features within those datasets. DataRobot provides an intuitive relationship editor that allows you to build and visualize these relationships. Feature Discovery engine analyzes the graphs and the included datasets to determine a feature engineering “recipe” and, from that recipe, generates secondary features for training and predictions. While configuring the deployment settings, you can change the selected secondary dataset configuration.

| Setting | Description |
| --- | --- |
| Secondary datasets configurations | Previews the dataset configuration or provides an option to change it. By default, DataRobot makes predictions using the secondary datasets configuration defined when starting the project. Click Change to select an alternative configuration before uploading a new primary dataset. |
