NVIDIA NIM Configuration¶

Part of NVIDIA AI Enterprise, NVIDIA NIM microservices are a set of easy-to-use microservices for accelerating the deployment of foundation models on any cloud or data center and helps keep your data secure. NIM microservices have production-grade runtimes including on-going security updates.

DataRobot runs these NIM microservices on top of the Custom Models infrastructure to layer on Bolt-on-Governance APIs.

依存関係¶

TODO: link Custom Models with GPU Inference
Custom Model NIM Templates
NVIDIA Enterprise License and NGC API Key
Custom Models Internet Access

Supported NIMs¶

In the 11.0 release, DataRobot supports a subset of the NIMs that NVIDIA provides. For a full list of all NIMs that are available and also more detailed hardware requirements consult the upstream documentation. However, below is the list of NIMs that DataRobot supports, along with recommended GPU requirements.

Customers are allowed to attempt to launch NIMs on non-recommended GPU/hardware configurations but they may fail after hitting the startup timeout (toDO link configurable](./custom-models-configuration.md)).

GPU Recommendations¶

| 名前 | GPU Recommendations | |------|-------------------| | ai-generated-image-detection | 1xA100, 1xA10G, 1xH100, 1xL40S | | alphafold2 | 1xA100, 1xL40S, 4xL40S, 8xL40S | | alphafold2-multimer | 1xA100, 1xL40S, 4xL40S, 8xL40S | | arctic-embed-l | 1xA100, 1xA10G, 1xH100, 1xL40S | | codellama-13b-instruct | 2xA100, 4xA100, 4xA10G, 8xA10G, 2xH100, 4xH100 | | codellama-34b-instruct | 2xA100, 4xA100, 4xA10G, 8xA10G, 2xH100, 4xH100, 4xL40S | | codellama-70b-instruct | 4xA100, 8xA100, 8xA10G, 4xH100, 8xH100 | | corrdiff | 1xA100, 1xH100, 1xL40S, 1xRTX | | deepfake-image-detection | 1xA100, 1xA10G, 1xH100, 1xL40S | | deepseek-r1-distill-llama-8b | 1xA100, 2xA10G, 1xH100, 4xL40S | | deepseek-r1-distill-qwen-7b | 1xA10G, 1xL4OS | | diffdock | 1xA10G, 1xL40S | | evo2-40b | 1xH100, 1xH200, 2xL40S, 4xL40S, 8xL40S | | fourcastnet | 1xA100, 1xH100, 1xL40S, 1xRTX | | gemma-2-2b-instruct | 1xA100, 1xA10G, 1xH100, 1xL40S | | gemma-2-9b-it | 1xA100, 1xA10G, 1xH100, 1xL40S, 2xT4 | | genmol | 1xA100, 1xA10G, 1xA6000, 1xH100, 1xL40S | | llama-2-13b-chat | 1xA100, 2xA100, 1xH100, 2xH100, 1xL40S, 2xL40S | | llama-2-70b-chat | 4xA100, 8xA100, 2xH100, 4xH100, 8xH100, 4xL40S | | llama-2-7b-chat | 1xA100, 2xA100, 1xH100, 2xH100, 1xL40S, 2xL40S | | llama-3-sqlcoder-8b | 1xA10G, 2xA10G, 4xA10G, 1xH100, 2xH100, 1xL40S, 2xL40S | | llama-3-swallow-70b-instruct-v0.1 | 2xA100, 4xA10G, 2xH100, 4xH100, 2xL40S | | llama-3-taiwan-70b-instruct | 2xA100, 4xA10G, 2xH100, 4xH100, 2xL40S | | llama-3.1-70b-instruct | 4xA100, 8xA100, 2xH100, 4xH100, 8xH100, 1xH200, 2xH200, 4xH200, 4xL40S | | llama-3.1-8b-base | 1xA100, 2xA100, 2xA10G, 4xA10G, 1xH100, 2xH100, 2xL40S | | llama-3.1-8b-instruct | 2xA10G, 1xH100, 1xL40S | | llama-3.1-nemoguard-8b-content-safety | 1xL40S, 4xL40S, 8xL40S | | llama-3.1-nemoguard-8b-topic-control | 1xL40S, 4xL40S, 8xL40S | | llama-3.1-nemotron-70b-instruct | 4xA100, 8xA100, 2xH100, 4xH100, 8xH100, 8xL40S | | llama-3.1-swallow-70b-instruct-v0.1 | 4xA100, 4xH100, 2xH200, 8xL40S | | llama-3.1-swallow-8b-instruct-v0.1 | 1xA10G, 1xL40S | | llama-3.2-11b-vision-instruct | 1xA100, 2xA100, 4xA10G, 8xA10G, 1xH100, 2xH100, 1xH200, 2xH200, 2xL40S, 4xL40S | | llama-3.2-90b-vision-instruct | 4xA100, 8xA100, 2xH100, 4xH100, 8xH100, 1xH200, 2xH200, 4xH200, 8xL40S | | llama-3.2-nv-embedqa-1b-v2 | 1xA100, 1xA10G, 1xH100, 1xL40S | | llama-3.2-nv-rerankqa-1b-v2 | 1xA100, 1xA10G, 1xH100, 1xL4, 1xL40S | | llama-3.3-70b-instruct | 8xA100, 8xH100, 4xH200, 8xL40S | | llama3-70b-instruct | 4xA100, 8xA10G, 4xH100, 8xH100, 8xL40S | | llama3-8b-instruct | 1xA100, 2xA100, 1xA10G, 2xA10G, 1xH100, 2xH100, 1xL40S, 2xL40S | | maisi | 1xA100, 4xA10G, 1xH100, 2xL40S | | mistral-7b-instruct-v0.3 | 1xA100, 2xA100, 2xA10G, 4xA10G, 1xH100, 2xH100, 1xL40S, 2xL40S | | mistral-nemo-12b-instruct | 1xA100, 2xA100, 8xA10G, 1xH100, 2xH100 | | mistral-nemo-minitron-8b-8k-instruct | 1xA100, 2xA100, 2xA10G, 4xA10G, 1xH100, 2xH100, 1xL40S, 2xL40S | | mixtral-8x7b-instruct-v01 | 2xA100, 4xA100, 8xA10G, 2xH100, 4xH100, 4xL40S | | molmim | 1xA10G, 1xL40S | | nemoguard-jailbreak-detect | 1xA100, 1xA10G, 1xH100, 1xL40S | | nv-embedqa-e5-v5 | 1xA100, 1xA10G, 1xH100, 1xL40S | | nv-embedqa-e5-v5-pb24h2 | 1xA100, 1xA10G, 1xH100, 1xL40S | | nv-embedqa-mistral-7b-v2 | 1xA100, 1xH100, 1xL40S | | nv-rerankqa-mistral-4b-v3 | 1xA100, 1xA10G, 1xH100, 1xL40S | | nv-yolox-page-elements-v1 | 1xA100, 1xA10G, 1xL40S | | nvclip | 1xA100, 1xA10G, 1xH100, 1xL40S | | paddleocr | 1xA100, 1xA10G, 1xT4 | | phi-3-mini-4k-instruct | 1xA100, 1xA10G, 1xH100, 1xL40S | | phind-codellama-34b-v2-instruct | 2xA100, 4xA100, 8xA10G, 2xH100, 4xH100, 4xL40S | | proteinmpnn | 1xA10G, 1xL40S | | qwen-2.5-72b-instruct | 1xA10G, 1xH100, 1xL40S | | qwen-2.5-7b-instruct | 1xA10G, 1xH100, 1xL40S | | rfdiffusion | 1xA100, 1xH100, 1xL40S | | starcoderbase-15b | 2xA10G, 4xA10G, 8xA10G, 2xH100, 2xL40S, 4xL40S, 8xL40S | | vista3d | 1xA100, 1xH100, 1xL40S, 1xRTX |

General hardware requirements¶

NVIDIA has general guidelines for hardware requirements. Review the full official docs, but below is a brief summary:

Disk: the base NIM docker images range from 4GB to 23GB. You also need enough disk space to store the model weights (2GB per billion params is a good rule of thumb) and for trtllm_buildable profiles you needs 2-3x the amount of this value.
CPU: greater than or equal to 8 cores
Memory: 32GB of RAM minimum for pre-optimized models. For trtllm_buildable profiles you needs considerably more RAM, consult the official documentation.

Software requirements¶

NVIDIA has guidelines on the software to install on the Kubernetes nodes that's running the NIMs. The most critical aspects of their requirements are that the required versions of the NVIDIA and CUDA drivers are installed and functional. Docker isn't required on Kubernetes nodes as containerd or CRI-O are sufficient.

Load NIM templates¶

NIM Templates are built on top of the same infrastructure that provides Application Templates and Custom Metrics Templates. For a detailed explanation of all the configuration options for this tool, consult the dedicated toDO link documentation](./custom-templates-core-integration-job.md). All NIM templates and required execution environments is installed automatically as part of the standard helm install process.

These templates reference the included NIM execution environment that contains the DRUM server that proxies requests between the user and the NIM container and overlays MLOps monitoring of requests and other functionality. The templates also define supported Runtime Parameters and recommended resource bundles along with optional custom.py implementations for specific models. If changes are required for any NIM, a new version of the datarobot-custom-templates image need to be installed.

Configure custom models with GPU inference¶

The NIMs functionality is built on top of the Custom Models platform and specifically requires the GPU support that was added in 10.0. See the documentation that was created as part of that release as it's still applicable here: toDO link Custom Models with GPU Inference](./custom-models-configuration.md#gpu-configuration). The main point to take from that guide is to define the appropriate resource bundles. The bundles are highly dependant on the customer hardware and the specific NIMs that are desired to run.

Configure feature flags¶

The ENABLE_NIM_MODELS feature flag must be turned on to utilize the NIM functionality. This feature-flag also has the following dependencies:

ENABLE_MLOPS: NIMs are part of MLOps
ENABLE_CUSTOM_INFERENCE_MODEL: NIMs are built on the Custom Models Platform
ENABLE_CUSTOM_MODEL_GPU_INFERENCE: NIMs require GPU access
ENABLE_MLOPS_RESOURCE_REQUEST_BUNDLES: Dependency of the above flag
ENABLE_PUBLIC_NETWORK_ACCESS_FOR_ALL_CUSTOM_MODELS: Internet access is required to download model weights
ENABLE_MLOPS_TEXT_GENERATION_TARGET_TYPE: Many of the NIMs are LLMs which require the textGeneration target type

Other GenAI Related Flags:

ENABLE_COMPLIANCE_DOCUMENTATION
ENABLE_CUSTOM_MODEL_FEATURE_FILTERING
ENABLE_CUSTOM_MODEL_GITHUB_CI_CD
ENABLE_CUSTOM_MODEL_PREDICT_RESPONSE_EXTRA_MODEL_OUTPUT
ENABLE_GENAI_EXPERIMENTATION
ENABLE_MLOPS_ACTUALS_STORAGE
ENABLE_MMM_DATA_QUALITY
ENABLE_MMM_GLOBAL_MODELS_IN_MODEL_REGISTRY
ENABLE_MMM_VDB_DEPLOYMENT_TYPE
ENABLE_MODERATION_GUARDRAILS

NVIDIA Enterprise license and NGC API key¶

Usage of NIMs requires an NVIDIA Enterprise License. Follow NVIDIA's documentation for how to create your NGC Account if you don't already have one.

Generate an API key from NGC¶

Once you have an account, you can signin and browse to the Setup page accessible from the dropdown after clicking on your profile located in the top-right portion of the page. 以下の操作を行います。

Click the Generate API Key button.
Click Generate Personal Key in the top right of the page.
Fill in the form modal:
Give the key a descriptive Key Name.
Set the expiration to something appropriate to your IT policies.
Select Private Registry and NGC Catalog from the dropdown in the Services Included field.
Click Generate Personal Key
Be sure to record your key as you isn't able to view it again.

Save NGC key into DataRobot secure configuration¶

The DataRobot NIM Templates use Secure Configuration to securely transmit the NGC Key from the DataRobot credentials store to the running NIM container. The credential is used to optionally pull the base Docker image (if it's not cached locally) and also used to pull down the model weights from NGC at startup (local caching not supported in 11.0).

You must create a new entry into the Secure configurations for the new type NGC API Token. The entry must be shared with at least Consumer level access to the organization or users that's deploying NIMs.