# Add DataRobot Serverless prediction environments

> Add DataRobot Serverless prediction environments - Review the DataRobot prediction environments
> available to you and create DataRobot Serverless prediction environments to make scalable
> predictions with configurable compute instance settings.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-04-24T16:03:56.568161+00:00` (UTC).

## Primary page

- [Add DataRobot Serverless prediction environments](https://docs.datarobot.com/en/docs/classic-ui/mlops/deployment/prediction-env/pred-env.html): Full documentation for this topic (HTML).

## Sections on this page

- [Deploy a model to the DataRobot Serverless prediction environment](https://docs.datarobot.com/en/docs/classic-ui/mlops/deployment/prediction-env/pred-env.html#deploy-a-model-to-the-datarobot-serverless-prediction-environment): In-page section heading.
- [Make predictions](https://docs.datarobot.com/en/docs/classic-ui/mlops/deployment/prediction-env/pred-env.html#make-predictions): In-page section heading.

## Related documentation

- [Classic UI documentation](https://docs.datarobot.com/en/docs/classic-ui/index.html): Linked from this page.
- [MLOps](https://docs.datarobot.com/en/docs/classic-ui/mlops/index.html): Linked from this page.
- [Deployment](https://docs.datarobot.com/en/docs/classic-ui/mlops/deployment/index.html): Linked from this page.
- [Manage prediction environments](https://docs.datarobot.com/en/docs/classic-ui/mlops/deployment/prediction-env/index.html): Linked from this page.
- [include pre-computed prediction intervals when registering the model package](https://docs.datarobot.com/en/docs/classic-ui/mlops/deployment/registry/dr-model-reg.html): Linked from this page.
- [enabling prediction intervals](https://docs.datarobot.com/en/docs/classic-ui/mlops/deployment-settings/predictions-settings.html#set-prediction-intervals-for-time-series-deployments): Linked from this page.
- [configure the deployment settings](https://docs.datarobot.com/en/docs/classic-ui/mlops/deployment/deploy-methods/add-deploy-info.html): Linked from this page.
- [batch prediction limits](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/index.html#limits): Linked from this page.
- [UI batch predictions](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-predictions/nxt-make-predictions.html): Linked from this page.
- [Prediction API scripting predictions](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-predictions/nxt-pred-api-snippets.html#batch-prediction-snippet-settings): Linked from this page.

## Documentation content

# Add DataRobot Serverless prediction environments

On the Prediction Environments page, you can review the DataRobot prediction environments available to you and create DataRobot Serverless prediction environments to make scalable predictions with configurable compute instance settings.

**Managed AI Platform (SaaS):**
> [!NOTE] Pre-provisioned DataRobot Serverless environments
> Organizations created after November 2024 have access to a pre-provisioned DataRobot Serverless prediction environment on the Prediction Environments page.

**Trial:**
> [!NOTE] Pre-provisioned DataRobot Serverless environments
> Trial accounts have access to a pre-provisioned DataRobot Serverless prediction environment on the Prediction Environments page.

**Self-Managed AI Platform:**
> [!NOTE] Pre-provisioned DataRobot Serverless environments
> New Self-Managed organizations running DataRobot 10.2+ installations have access to a pre-provisioned DataRobot Serverless prediction environment on the Prediction Environments page.


> [!WARNING] Prediction intervals in DataRobot serverless prediction environments
> In a DataRobot serverless prediction environment, to make predictions with time-series prediction intervals included, you must [include pre-computed prediction intervals when registering the model package](https://docs.datarobot.com/en/docs/classic-ui/mlops/deployment/registry/dr-model-reg.html). If you don't pre-compute prediction intervals, the deployment resulting from the registered model doesn't support [enabling prediction intervals](https://docs.datarobot.com/en/docs/classic-ui/mlops/deployment-settings/predictions-settings.html#set-prediction-intervals-for-time-series-deployments).

To add a DataRobot Serverless prediction environment:

1. ClickDeployments > Prediction Environmentsand then click+ Add prediction environment.
2. In theAdd prediction environmentdialog box, complete the following fields: FieldDescriptionNameEnter a descriptive name for the prediction environment.Description(Optional) Enter a description of the external prediction environment.PlatformSelectDataRobot Serverless.Batch jobsMax Concurrent JobsDecrease the maximum number of concurrent jobs for this Serverless environment from the organization's defined maximum.PrioritySet the importance of batch jobs on this environment. How is the maximum concurrent job limit defined?There are two limits on max concurrent jobs and these limits depend on the details of your DataRobot installation. Each batch job is subject to both limits, meaning that the conditions of both must be satisfied for a batch job to run on the prediction environment. The first limit is theorganization-levellimit (default of30forSelf-Managedinstallations or10forSaaS) defined by an organization administrator; this should be the higher limit. The second limit is theenvironment-levellimit defined here by the prediction environment creator; this limit should be lower than the organization-level limit.
3. Once you configure the environment settings, clickAdd environment.

The environment is now available from the Prediction Environments page.

## Deploy a model to the DataRobot Serverless prediction environment

Using the pre-provisioned DataRobot Serverless environment, or a Serverless environment you created, you can deploy a model to make predictions.

To deploy a model to the DataRobot Serverless prediction environment:

1. On thePrediction Environmentspage, in thePlatformrow, locate theDataRobot Serverlessprediction environments, and click the environment you want to deploy a model to.
2. On theDetailstab, underUsages, in theDeploymentcolumn, click+ Add new deployment.
3. In theSelect model version from the registrydialog box, enter the name of the model you want to deploy in theSearchbox, click the model, and then click theDataRobotmodel version you want to deploy.
4. ClickSelect model versionand thenconfigure the deployment settings.
5. To configure on-demand predictions on this environment, clickShow advanced options, scroll down toAdvanced Predictions Configuration, set the followingAutoscalingoptions, and then clickDeploy model:

Autoscaling automatically adjusts the number of replicas in your deployment based on incoming traffic. During high-traffic periods, it adds replicas to maintain performance. During low-traffic periods, it removes replicas to reduce costs. This eliminates the need for manual scaling while ensuring your deployment can handle varying loads efficiently.

**Basic autoscaling:**
To configure autoscaling, modify the following settings. Note that for DataRobot models, DataRobot performs autoscaling based on CPU usage at a 40% threshold.:

[https://docs.datarobot.com/en/docs/images/nxt-real-time-pred-configure.png](https://docs.datarobot.com/en/docs/images/nxt-real-time-pred-configure.png)

Field
Description
Minimum compute instances
(Premium feature)
Set the minimum compute instances for the model deployment. If your organization doesn't have access to "always-on" predictions, this setting is set to
0
and isn't configurable. With the minimum compute instances set to
0
, the inference server will be stopped after an inactivity period of 7 days. The minimum and maximum compute instances depend on the model type. For more information, see the
compute instance configurations
note.
Maximum compute instances
Set the maximum compute instances for the model deployment to a value above the current configured minimum. To limit compute resource usage, set the maximum value equal to the minimum. The minimum and maximum compute instances depend on the model type. For more information, see the
compute instance configurations
note.

**Advanced autoscaling (custom models):**
To configure autoscaling, select the metric that will trigger scaling:

CPU utilization: Set a threshold for the average CPU usage across active replicas. When CPU usage exceeds this threshold, the system automatically adds replicas to provide more processing power.
HTTP request concurrency: Set a threshold for the number of simultaneous requests being processed. For example, with a threshold of 5, the system will add replicas when it detects 5 concurrent requests being handled.

When your chosen threshold is exceeded, the system calculates how many additional replicas are needed to handle the current load. It continuously monitors the selected metric and adjusts the replica count up or down to maintain optimal performance while minimizing resource usage.

Review the settings for CPU utilization below.

[https://docs.datarobot.com/en/docs/images/nxt-real-time-pred-cpu.png](https://docs.datarobot.com/en/docs/images/nxt-real-time-pred-cpu.png)

Field
Description
CPU utilization (%)
Set the target CPU usage percentage that triggers scaling. When CPU utilization reaches this threshold, the system adds more replicas.
Cool down period (minutes)
Set the wait time after a scale-down event before another scale-down can occur. This prevents rapid scaling fluctuations when metrics are unstable.
Minimum compute instances
(Premium feature)
Set the minimum compute instances for the model deployment. If your organization doesn't have access to "always-on" predictions, this setting is set to
0
and isn't configurable. With the minimum compute instances set to
0
, the inference server will be stopped after an inactivity period of 7 days. The minimum and maximum compute instances depend on the model type. For more information, see the
compute instance configurations
note.
Maximum compute instances
Set the maximum compute instances for the model deployment to a value above the current configured minimum. To limit compute resource usage, set the maximum value equal to the minimum. The minimum and maximum compute instances depend on the model type. For more information, see the
compute instance configurations
note.

Review the settings for HTTP request concurrency below.

[https://docs.datarobot.com/en/docs/images/nxt-real-time-pred-http.png](https://docs.datarobot.com/en/docs/images/nxt-real-time-pred-http.png)

Field
Description
HTTP request concurrency
Set the number of simultaneous requests required to trigger scaling. When concurrent requests reach this threshold, the system adds more replicas.
Cool down period (minutes)
Set the wait time after a scale-down event before another scale-down can occur. This prevents rapid scaling fluctuations when metrics are unstable.
Minimum compute instances
(Premium feature)
Set the minimum compute instances for the model deployment. If your organization doesn't have access to "always-on" predictions, this setting is set to
0
and isn't configurable. With the minimum compute instances set to
0
, the inference server will be stopped after an inactivity period of 7 days. The minimum and maximum compute instances depend on the model type. For more information, see the
compute instance configurations
note.
Maximum compute instances
Set the maximum compute instances for the model deployment to a value above the current configured minimum. To limit compute resource usage, set the maximum value equal to the minimum. The minimum and maximum compute instances depend on the model type. For more information, see the
compute instance configurations
note.


> [!NOTE] Premium feature: Always-on predictions
> Always-on predictions are a premium feature. Deployment autoscaling management is required to configure the minimum compute instances setting. Contact your DataRobot representative or administrator for information on enabling the feature.
> 
> Feature flag: Enable Deployment Auto-Scaling Management

> [!NOTE] Compute instance configurations
> For DataRobot model deployments:
> 
> The default minimum is 0 and default maximum is 3.
> The minimum and maximum limits are taken from the organization's
> max_compute_serverless_prediction_api
> setting.
> 
> For custom model deployments:
> 
> The default minimum is 0 and default maximum is 1.
> The minimum and maximum limits are taken from the organization's
> max_custom_model_replicas_per_deployment
> setting.
> The minimum is always greater than 1 when running on GPUs (for LLMs).
> 
> Additionally, for high availability scenarios:
> 
> The minimum compute instances setting
> must
> be greater than or equal to 2.
> This requires business critical or consumption-based pricing.

Depending on the availability of compute resources, it can take a few minutes after deployment for a prediction environment to be available for predictions.

> [!TIP] Update compute instances settings
> If, after deployment, you need to update the number of compute instances available to the model, you can change these settings on the [Predictions Settings](https://docs.datarobot.com/en/docs/classic-ui/mlops/deployment-settings/predictions-settings.html) tab.

## Make predictions

After you've created a DataRobot Serverless environment and deployed a model to that environment you can make real-time or batch predictions.

> [!NOTE] Payload size limit
> The maximum payload size for real-time deployment predictions on Serverless prediction environments is 50MB. For batch predictions, see [batch prediction limits](https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/index.html#limits).

**Real-time predictions:**
To make real-time predictions on the DataRobot Serverless prediction environment:

In the
Deployments
inventory, locate and open a deployment associated with a DataRobot Serverless environment. To do this, click
Filter
, select
DataRobot Serverless
, and then click
Apply filters
.
In a deployment associated with a DataRobot Serverless prediction environment, click
Predictions > Prediction API
.
On the
Prediction API Scripting Code
page, under
Prediction Type
, click
Real-time
.
Under
Language
, select
Python
or
cURL
, optionally enable
Show secrets
, and click
Copy script to clipboard
.
Run the Python or cURL snippet to make a prediction request to the DataRobot Serverless deployment.

**Batch predictions:**
To make batch predictions on the DataRobot Serverless prediction environment, follow the standard process for [UI batch predictions](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-predictions/nxt-make-predictions.html) or [Prediction API scripting predictions](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-predictions/nxt-pred-api-snippets.html#batch-prediction-snippet-settings).
