Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Import and deploy with NVIDIA NIM

Premium

The use of NVIDIA Inference Microservices (NIM) in DataRobot requires access to premium features for GenAI experimentation and GPU inference. Contact your DataRobot representative or administrator for information on enabling the required features.

The DataRobot integration with the NVIDIA AI Enterprise Suite enables users to perform one-click deployment of NVIDIA Inference Microservices (NIM) on GPUs in DataRobot Serverless Compute. This process starts in the Registry, where you can import NIM containers from the NVIDIA AI Enterprise model catalog. The registered model is optimized for deployment to Console and is compatible with the DataRobot monitoring and governance framework.

NVIDIA NIM provides optimized foundational models you can add to a playground in Workbench for evaluation and inclusion in agentic blueprints, embedding models used to create vector databases, and NVIDIA NeMo Guardrails used in the DataRobot moderation framework to secure your agentic application.

Import from NVIDIA GPU Cloud (NGC)

On the Models tab in the Registry, create a registered model from the gallery of available NIM models, selecting the model name and performance profile and reviewing the information provided on the model card.

To import from NVIDIA NGC:

  1. On the Registry > Models tab, next to + Register model, click and then Import from NVIDIA NGC.

  2. In the Import from NVIDIA NGC panel, on the Select NIM tab, click a NIM in the gallery.

    Search the gallery

    To direct your search, you can Search, filter by Publisher, or click Sort by to order the gallery by date added or alphabetically (ascending or descending).

  3. Review the model information from the NVIDIA NGC source, then click Next.

  4. On the Register model tab, configure the following fields and click Register:

    Field Description
    Registered model name / Registered model Configure one of the following:
    • Registered model name: When registering a new model, enter a unique and descriptive name for the new registered model. If you choose a name that exists anywhere within your organization, a warning appears.
    • Registered model: When saving as a version of an existing model, select the existing registered model you want to add a new version to.
    Registered version name Automatically populated with the model name and the word version. Change the version name or modify the default version name as necessary.
    Registered model version Assigned automatically. This displays the expected version number of the version (e.g., V1, V2, V3) you create. This is always V1 when you select Register as a new model.
    Resource bundle Recommended automatically. If possible, DataRobot translates the GPU requirements for the selected model into a resource bundle. In some cases, DataRobot can't detect a compatible resource bundle. To identify a resource bundle with sufficient VRAM, review the documentation for that NIM.
    NVIDIA NGC API key Select the credential associated with your NVIDIA NGC API key.
    Optional settings
    Registered version description Enter a description of the business problem this model package solves, or, more generally, describe the model represented by this version.
    Tags Click + Add tag and enter a Key and a Value for each key-value pair you want to tag the model version with. Tags added when registering a new model are applied to V1.

Deploy the registered NVIDIA NIM

After the NVIDIA NIM is registered, deploy it to a DataRobot Serverless prediction environment.

To deploy a registered model to a DataRobot Serverless environment:

  1. On the Registry > Models tab, locate and click the registered NIM, and then click the version to deploy.

  2. In the registered model version, you can review the version information, then click Deploy.

  3. In the Prediction history and service health section, under Choose prediction environment, verify that the correct prediction environment with Platform: DataRobot Serverless is selected.

    Change DataRobot Serverless environments

    If the correct DataRobot Serverless environment isn't selected, click Change. On the Select prediction environment panel's DataRobot Serverless tab, select a different serverless prediction environment from the list.

  4. Optionally, configure additional deployment settings. Then, when the deployment is configured, click Deploy model.

Make predictions with the deployed NVIDIA NIM

After the model is deployed to a DataRobot Serverless prediction environment, you can access real-time prediction snippets from the deployment's Predictions tab. The requirements for running the prediction snippet depend on the model type: text generation or unstructured.

When you add a NIM to the Registry in DataRobot, LLMs are imported as text generation models, allowing you to use the Bolt-on Governance API to communicate with the deployed NIM. Other types of models are imported as unstructured models and endpoints provided by the NIM containers are exposed to communicate with the deployed NIM. This provides the flexibility required to deploy any NIM on GPU infrastructure using DataRobot Serverless Compute.

Target type Supported endpoint type Description
Text generation /chat/completions Deployed text generation NIM models provide access to the /chat/completions endpoint. Use the code snippet provided on the Predictions tab to make predictions.
Unstructured /directAccess/nim/ Deployed unstructured NIM models provide access to the /directAccess/nim/ endpoint. Modify the code snippet provided on the Predictions tab to provide a NIM URL suffix and a properly formed payload.
Unstructured (embedding model) Both Deployed unstructured NIM embedding models can provide access to both the /directAccess/nim/ and /chat/completions endpoints. Modify the code snippet provided on the Predictions tab to suit your intended usage.

Text generation model endpoints

Access the Prediction API scripting code on the deployment's Predictions > Prediction API tab. For a text generation model, the endpoint link required is the base URL of the DataRobot deployment. For more information, see the Bolt-on Governance API documentation.

Prediction snippets for Private CA environments

For Self-managed AI Platform installations in a Private Certificate Authority (Private CA) environment, the snippets provided on the Predictions tab may need to be updated, depending on how your organization's IT team configured the Private CA environment.

If your organization's Private CA environment requires modifications to the provided prediction snippet, locate the following code:

Standard Prediction API scripting code
1
2
3
4
5
openai_client = OpenAI(
        base_url=CHAT_API_URL,
        api_key=API_KEY,
        _strict_response_validation=False
    )

Update the code above, making the following changes to allow the prediction snippet to access the Private CA bundle file:

Private CA Prediction API scripting code
1
2
3
4
5
6
7
8
ctx = ssl.create_default_context(cafile="<full path to CA bundle file>")
http_client = openai.DefaultHttpxClient(verify=ctx)
openai_client = OpenAI(
    base_url=CHAT_API_URL,
    api_key=API_KEY,
    http_client=http_client,
    _strict_response_validation=False
)

Unstructured model endpoints

Access the Prediction API scripting code from the deployment's Predictions > Prediction API tab. For unstructured models, endpoints provided by the NIM containers are exposed to enable communication with the deployed NIM. To determine how to construct the correct endpoint URL and send a request to a deployed NVIDIA NIM instance, refer to the documentation for the registered and deployed NIM, listed below.

Observability for direct access endpoints

Most unstructured models from NVIDIA NIM only provide access to the /directAccess/nim/ endpoint. This endpoint is compatible with a limited set of observability features. For example, accuracy and drift tracking is not supported for the /directAccess/nim/ endpoint.

To use the Prediction API scripting code, perform the following steps and use the send_request function to communicate with the model:

  1. Review the BASE_API_URL (line 4). This is the prefix of the endpoint. It automatically populates with the deployment's base URL.
  2. Retrieve the appropriate NIM_SUFFIX (line 10). This is the suffix of the NIM endpoint. Locate this suffix in the NVIDIA NIM documentation for the deployed model.
  3. Construct the request payload (sample_payload, line 45). This request payload must be structured based on the model’s API specifications from the NVIDIA NIM documentation for the deployed model.
Prediction API scripting code
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import requests

# Base API URL for the deployed NIM
BASE_API_URL = '{{ prediction_api_url }}' # noqa

# API Key for authentication
API_KEY = '{{ api_key }}'

# Define the NIM model-specific endpoint suffix (update from Runtime Logs or NIM documentation)
NIM_SUFFIX = 'v1/your-model-endpoint'  # Modify this based on the required model

# Construct the full API URL for the request
DEPLOYMENT_URL = f"{BASE_API_URL}/{NIM_SUFFIX}"

def send_request(payload: dict):
    """
    Sends a request to the deployed NIM model.

    Args:
        payload (dict): **User must structure this input based on the model's API specifications.**
                       Refer to the model’s documentation for the correct request format.

    Returns:
        dict: JSON response from the NIM model.

    Raises:
        Exception: If the request fails, an error message is displayed.
    """
    headers = {
        "Authorization": f"Bearer {API_KEY}",  # Add API key for authentication
        "Content-Type": "application/json"
    }

    try:
        response = requests.post(DEPLOYMENT_URL, json=payload, headers=headers)
        response.raise_for_status()  # Raise an error for bad responses (4xx, 5xx)
        return response.json()

    except requests.exceptions.RequestException as e:
        raise Exception(f"API request failed: {e} {response.json()}")

if __name__ == "__main__":
    # Construct this payload according to the model's documentation!
    # Example input structure (modify as per model API reference)
    sample_payload = {
        "input": "your_model_input_here"  # Replace with required input format from model docs
    }

    # Send request and print response
    try:
        result = send_request(sample_payload)
        print("API Response:", result)
    except Exception as error:
        print("Error:", error)
Unstructured model NVIDIA NIM documentation list

For unstructured models, the required NIM endpoint can be found in the NIM documentation. The list below provides the documentation link required to assemble the NIM_SUFFIX and sample_payload.

Unstructured models with text generation support

Embedding models are imported and deployed as unstructured models while maintaining the ability to request chat completions. The following embedding models support both a direct access endpoint and a chat completions endpoint:

  • arctic-embed-l
  • llama-3.2-nv-embedqa-1b-v2
  • nv-embedqa-e5-v5
  • nv-embedqa-e5-v5-pb24h2
  • nv-embedqa-mistral-7b-v2
  • nvclip

Each embedding NIM is deployed as an unstructured model, providing a REST interface at /directAccess/nim/. In addition, these models are capable of returning chat completions, so the code snippet provides a BASE_API_URL with the /chat/completions endpoint used by (structured) text generation models. To use the Prediction API scripting code, review the table below to determine how to modify the prediction snippet to access each endpoint type:

Endpoint type Requirements
Direct access Update the BASE_API_URL (on line 4), replacing /chat/completions with /directAccess/nim/.
To structure the request payload, review the model’s API specifications from the NVIDIA NIM documentation for the deployed model.
Chat completion Update the DEPLOYMENT_URL (on line 13), removing /{NIM_SUFFIX} to create DEPLOYMENT_URL = BASE_API_URL.
To structure the request payload, review the model’s API specifications from the NVIDIA NIM documentation for the deployed model.
Prediction API scripting code
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import requests

# Base API URL for the deployed NIM
BASE_API_URL = 'https://app.datarobot.com/api/v2/deployments/deployment_id/chat/completions' # noqa

# API Key for authentication
API_KEY = '{{ api_key }}'

# Define the NIM model-specific endpoint suffix (Refer NIM API Reference link to get right suffix)
NIM_SUFFIX = 'v1/your-model-endpoint'  # Modify this based on the required model

# Construct the full API URL for the request
DEPLOYMENT_URL = f"{BASE_API_URL}/{NIM_SUFFIX}"

Updated April 3, 2025