# Import and deploy with NVIDIA NIM

> Import and deploy with NVIDIA NIM - Import, register, and deploy models with NVIDIA NIM to create an
> inference endpoint. Interact with inference endpoints using code or the DataRobot UI.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-05-06T18:17:10.045036+00:00` (UTC).

## Primary page

- [Import and deploy with NVIDIA NIM](https://docs.datarobot.com/en/docs/workbench/nxt-registry/nxt-model-directory/nxt-import-nvidia-ngc.html): Full documentation for this topic (HTML).

## Sections on this page

- [Import from NVIDIA GPU Cloud (NGC)](https://docs.datarobot.com/en/docs/workbench/nxt-registry/nxt-model-directory/nxt-import-nvidia-ngc.html#import-from-nvidia-gpu-cloud-ngc): In-page section heading.
- [Deploy the registered NVIDIA NIM](https://docs.datarobot.com/en/docs/workbench/nxt-registry/nxt-model-directory/nxt-import-nvidia-ngc.html#deploy-the-registered-nvidia-nim): In-page section heading.
- [Make predictions with the deployed NVIDIA NIM](https://docs.datarobot.com/en/docs/workbench/nxt-registry/nxt-model-directory/nxt-import-nvidia-ngc.html#make-predictions-with-the-deployed-nvidia-nim): In-page section heading.
- [Text generation model endpoints](https://docs.datarobot.com/en/docs/workbench/nxt-registry/nxt-model-directory/nxt-import-nvidia-ngc.html#text-generation-model-endpoints): In-page section heading.
- [Unstructured model endpoints](https://docs.datarobot.com/en/docs/workbench/nxt-registry/nxt-model-directory/nxt-import-nvidia-ngc.html#unstructured-model-endpoints): In-page section heading.
- [Unstructured models with text generation support](https://docs.datarobot.com/en/docs/workbench/nxt-registry/nxt-model-directory/nxt-import-nvidia-ngc.html#unstructured-models-with-text-generation-support): In-page section heading.

## Related documentation

- [NextGen UI documentation](https://docs.datarobot.com/en/docs/workbench/index.html): Linked from this page.
- [Registry](https://docs.datarobot.com/en/docs/workbench/nxt-registry/index.html): Linked from this page.
- [Models](https://docs.datarobot.com/en/docs/workbench/nxt-registry/nxt-model-directory/index.html): Linked from this page.
- [review the version information](https://docs.datarobot.com/en/docs/workbench/nxt-registry/nxt-model-directory/nxt-view-manage-reg-models.html#view-version-details): Linked from this page.
- [Optionally, configure additional deployment settings](https://docs.datarobot.com/en/docs/workbench/nxt-registry/nxt-model-directory/nxt-deploy-models.html#configure-deployment-settings): Linked from this page.
- [tracing table](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-monitoring/nxt-data-exploration.html#explore-deployment-data-tracing): Linked from this page.
- [enable prediction row storage](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-settings/nxt-data-exploration-settings.html): Linked from this page.
- [define an association ID](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-settings/nxt-custom-metrics-settings.html): Linked from this page.
- [Bolt-on Governance API](https://docs.datarobot.com/en/docs/agentic-ai/genai-code/genai-chat-completion-api.html): Linked from this page.

## Documentation content

> [!NOTE] Premium
> The use of NVIDIA Inference Microservices (NIM) in DataRobot requires access to premium features for GenAI experimentation and GPU inference. Contact your DataRobot representative or administrator for information on enabling the required features.

The DataRobot integration with the NVIDIA AI Enterprise Suite enables users to perform one-click deployment of NVIDIA Inference Microservices (NIM) on GPUs in DataRobot Serverless Compute. This process starts in Registry, where you can import NIM containers from the NVIDIA AI Enterprise model catalog. The registered model is optimized for deployment to Console and is compatible with the DataRobot monitoring and governance framework.

NVIDIA NIM provides optimized foundational models you can add to a playground in Workbench for evaluation and inclusion in agentic blueprints, embedding models used to create vector databases, and NVIDIA NeMo Guardrails used in the DataRobot moderation framework to secure your agentic application.

## Import from NVIDIA GPU Cloud (NGC)

On the Models tab in Registry, create a registered model from the gallery of available NIM models, selecting the model name and performance profile and reviewing the information provided on the model card.

To import from NVIDIA NGC:

1. On theRegistry > Modelstab, next to+ Register a model, clickand thenImport from NVIDIA NGC.
2. In theImport from NVIDIA NGCpanel, on theSelect NIMtab, click a NIM in the gallery. Search the galleryTo direct your search, you canSearch, filter byPublisher, or clickSort byto order the gallery by date added or alphabetically (ascending or descending).
3. Review the model information from the NVIDIA NGC source, then clickNext.
4. On theRegister modeltab, configure the following fields and clickRegister: FieldDescriptionRegistered model name / Registered modelConfigure one of the following:Registered model name:When registering a new model, enter auniqueand descriptive name for the new registered model. If you choose a name that exists anywhere within your organization, a warning appears.Registered model:When saving as a version of an existing model, select the existing registered model you want to add a new version to.Registered version nameAutomatically populated with the model name and the wordversion. Change the version name or modify the default version name as necessary.Registered model versionAssigned automatically. This displays the expected version number of the version (e.g., V1, V2, V3) you create. This is alwaysV1when you selectRegister as a new model.Resource bundleRecommended automatically. If possible, DataRobot translates the GPU requirements for the selected model into a resource bundle. In some cases, DataRobot can't detect a compatible resource bundle. To identify a resource bundle with sufficient VRAM, review the documentation for that NIM.For Managed AI Platform installations, note that theGPU - 5XLresource bundle can be difficult to procure on-demand. If possible, consider a smaller resource bundle.NVIDIA NGC API keySelect the credential associated with your NVIDIA NGC API key. Ensure that the selected NVIDIA NGC API key exists in your DataRobot organization, as cross-organization sharing of NVIDIA NGC API keys is unsupported. In addition, due to this restriction, cross-organization sharing of global models created with NVIDIA NIM is unsupported.Optional settingsRegistered version descriptionEnter a description of the business problem this model package solves, or, more generally, describe the model represented by this version.TagsClick+ Add tagand enter aKeyand aValuefor each key-value pair you want to tag the modelversionwith. Tags added when registering a new model are applied toV1.

## Deploy the registered NVIDIA NIM

After the NVIDIA NIM is registered, deploy it to a DataRobot Serverless prediction environment.

To deploy a registered model to a DataRobot Serverless environment:

1. On theRegistry > Modelstab, locate and click the registered NIM, and then click the version to deploy.
2. In the registered model version, you canreview the version information, then clickDeploy.
3. In thePrediction history and service healthsection, underChoose prediction environment, verify that the correct prediction environment withPlatform: DataRobot Serverlessis selected. Change DataRobot Serverless environmentsIf the correct DataRobot Serverless environment isn't selected, clickChange. On theSelect prediction environmentpanel'sDataRobot Serverlesstab, select a different serverless prediction environment from the list.
4. Optionally, configure additional deployment settings. Then, when the deployment is configured, clickDeploy model. Enable the tracing tableTo enable thetracing tablefor the NIM deployment, ensure that youenable prediction row storagein the data exploration (or challenger) settings and configure the deployment settings required todefine an association ID.

## Make predictions with the deployed NVIDIA NIM

After the model is deployed to a DataRobot Serverless prediction environment, you can access real-time prediction snippets from the deployment's Predictions tab. The requirements for running the prediction snippet depend on the model type: text generation or unstructured.

**Text generation:**
[https://docs.datarobot.com/en/docs/images/nxt-predict-nvidia-nim.png](https://docs.datarobot.com/en/docs/images/nxt-predict-nvidia-nim.png)

**Unstructured:**
[https://docs.datarobot.com/en/docs/images/nxt-predict-nvidia-nim-unstructured.png](https://docs.datarobot.com/en/docs/images/nxt-predict-nvidia-nim-unstructured.png)


When you add a NIM to Registry in DataRobot, LLMs are imported as text generation models, allowing you to use the [Bolt-on Governance API](https://docs.datarobot.com/en/docs/agentic-ai/genai-code/genai-chat-completion-api.html) to communicate with the deployed NIM. Other types of models are imported as unstructured models and endpoints provided by the NIM containers are exposed to communicate with the deployed NIM. This provides the flexibility required to deploy any NIM on GPU infrastructure using DataRobot Serverless Compute.

| Target type | Supported endpoint type | Description |
| --- | --- | --- |
| Text generation | /chat/completions | Deployed text generation NIM models provide access to the /chat/completions endpoint. Use the code snippet provided on the Predictions tab to make predictions. |
| Unstructured | /directAccess/nim/ | Deployed unstructured NIM models provide access to the /directAccess/nim/ endpoint. Modify the code snippet provided on the Predictions tab to provide a NIM URL suffix and a properly formed payload. |
| Unstructured (embedding model) | Both | Deployed unstructured NIM embedding models can provide access to both the /directAccess/nim/ and /chat/completions endpoints. Modify the code snippet provided on the Predictions tab to suit your intended usage. |

> [!NOTE] CSV predictions endpoint use
> With an imported text generation NIM, it is also possible to make requests to the `/predictions` endpoint (accepting CSV input). For CSV input submitted to the `/predictions` endpoint, ensure that you use `promptText` as the column name for user prompts to the text generation model. If the CSV input isn't provided in this format, those predictions do not appear in the deployment's [tracing table](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-monitoring/nxt-data-exploration.html#explore-deployment-data-tracing).

### Text generation model endpoints

Access the Prediction API scripting code on the deployment's Predictions > Prediction API tab. For a text generation model, the endpoint link required is the base URL of the DataRobot deployment. For more information, see the [Bolt-on Governance API](https://docs.datarobot.com/en/docs/agentic-ai/genai-code/genai-chat-completion-api.html) documentation.

**Prediction snippets for Private CA environments**

For Self-managed AI Platform installations in a Private Certificate Authority (Private CA) environment, the snippets provided on the Predictions tab may need to be updated, depending on how your organization's IT team configured the Private CA environment.

If your organization's Private CA environment requires modifications to the provided prediction snippet, locate the following code:

| Standard Prediction API scripting code |
| --- |
| 1 2 3 4 5 |

Update the code above, making the following changes to allow the prediction snippet to access the Private CA bundle file:

| Private CA Prediction API scripting code |
| --- |
| 1 2 3 4 5 6 7 8 |

### Unstructured model endpoints

Access the Prediction API scripting code from the deployment's Predictions > Prediction API tab. For unstructured models, endpoints provided by the NIM containers are exposed to enable communication with the deployed NIM. To determine how to construct the correct endpoint URL and send a request to a deployed NVIDIA NIM instance, refer to the documentation for the registered and deployed NIM, [listed below](https://docs.datarobot.com/en/docs/workbench/nxt-registry/nxt-model-directory/nxt-import-nvidia-ngc.html#nim-documentation-list).

> [!NOTE] Observability for direct access endpoints
> Most unstructured models from NVIDIA NIM only provide access to the `/directAccess/nim/` endpoint. This endpoint is compatible with a limited set of observability features. For example, accuracy and drift tracking is not supported for the `/directAccess/nim/` endpoint.

To use the Prediction API scripting code, perform the following steps and use the `send_request` function to communicate with the model:

1. Review the BASE_API_URL (line 4). This is the prefix of the endpoint. It automatically populates with the deployment's base URL.
2. Retrieve the appropriate NIM_SUFFIX (line 10). This is the suffix of the NIM endpoint. Locate this suffix in the NVIDIA NIM documentation for the deployed model .
3. Construct the request payload ( sample_payload , line 45). This request payload must be structured based on the model’s API specifications from the NVIDIA NIM documentation for the deployed model .

| Prediction API scripting code |
| --- |
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |

**Unstructured model NVIDIA NIM documentation list**

For unstructured models, the required NIM endpoint can be found in the NIM documentation. The list below provides the documentation link required to assemble the `NIM_SUFFIX` and `sample_payload`.

- arctic-embed-l
- boltz-2
- cuopt
- diffdock
- genmol
- llama-3.2-nv-embedqa-1b-v2
- llama-3.2-nv-rerankqa-1b-v2
- molmim
- nemoguard-jailbreak-detect
- nemoretriever-graphic-elements-v1
- nemoretriever-page-elements-v2
- nemoretriever-parse
- nemoretriever-table-structure-v1
- nv-embedqa-e5-v5
- nv-embedqa-e5-v5-pb24h2
- nv-embedqa-mistral-7b-v2
- nv-rerankqa-mistral-4b-v3
- nvclip
- openfold3
- paddleocr
- proteinmpnn
- rfdiffusion

### Unstructured models with text generation support

Embedding models are imported and deployed as unstructured models while maintaining the ability to request chat completions. 
The following embedding models support both a direct access endpoint and a chat completions endpoint:

- arctic-embed-l
- llama-3.2-nv-embedqa-1b-v2
- nv-embedqa-e5-v5
- nv-embedqa-e5-v5-pb24h2
- nv-embedqa-mistral-7b-v2
- nvclip

Each embedding NIM is deployed as an unstructured model, providing a REST interface at `/directAccess/nim/`. In addition, these models are capable of returning chat completions, so the code snippet provides a `BASE_API_URL` with the `/chat/completions` endpoint used by (structured) text generation models. To use the Prediction API scripting code, review the table below to determine how to modify the prediction snippet to access each endpoint type:

| Endpoint type | Requirements |
| --- | --- |
| Direct access | Update the BASE_API_URL (on line 4), replacing /chat/completions with /directAccess/nim/. To structure the request payload, review the model’s API specifications from the NVIDIA NIM documentation for the deployed model. |
| Chat completion | Update the DEPLOYMENT_URL (on line 13), removing /{NIM_SUFFIX} to create DEPLOYMENT_URL = BASE_API_URL. To structure the request payload, review the model’s API specifications from the NVIDIA NIM documentation for the deployed model. |

| Prediction API scripting code |
| --- |
| 1 2 3 4 5 6 7 8 9 10 11 12 13 |
