NVIDIA NIM Configuration¶
Part of NVIDIA AI Enterprise, NVIDIA NIM microservices are a set of easy-to-use microservices for accelerating the deployment of foundation models on any cloud or data center and helps keep your data secure. NIM microservices have production-grade runtimes including on-going security updates.
DataRobot runs these NIM microservices on top of the Custom Models infrastructure so we can layer on our Bolt-on-Governance APIs.
依存関係¶
- toDO link Custom Models with GPU Inference](./custom-models-configuration.md#gpu-configuration)
- Custom Model NIM Templates
- NVIDIA Enterprise License and NGC API Key
- Custom Models Internet Access
Supported NIMs¶
In the 11.0 release, we support a subset of the NIMs that NVIDIA provides. For a full list of all NIMs that are available and also more detailed hardware requirements consult the upstream documentation. However, below is the list of NIMs that DataRobot supports, along with recommended GPU requirements.
Customers are allowed to attempt to launch NIMs on non-recommended GPU/hardware configurations but they may fail after hitting the startup timeout (toDO link configurable](./custom-models-configuration.md)).
GPU Recommendations¶
| 名前 | GPU Recommendations | |------|-------------------| | ai-generated-image-detection | 1xA100, 1xA10G, 1xH100, 1xL40S | | alphafold2 | 1xA100, 1xL40S, 4xL40S, 8xL40S | | alphafold2-multimer | 1xA100, 1xL40S, 4xL40S, 8xL40S | | arctic-embed-l | 1xA100, 1xA10G, 1xH100, 1xL40S | | codellama-13b-instruct | 2xA100, 4xA100, 4xA10G, 8xA10G, 2xH100, 4xH100 | | codellama-34b-instruct | 2xA100, 4xA100, 4xA10G, 8xA10G, 2xH100, 4xH100, 4xL40S | | codellama-70b-instruct | 4xA100, 8xA100, 8xA10G, 4xH100, 8xH100 | | corrdiff | 1xA100, 1xH100, 1xL40S, 1xRTX | | deepfake-image-detection | 1xA100, 1xA10G, 1xH100, 1xL40S | | deepseek-r1-distill-llama-8b | 1xA100, 2xA10G, 1xH100, 4xL40S | | deepseek-r1-distill-qwen-7b | 1xA10G, 1xL4OS | | diffdock | 1xA10G, 1xL40S | | evo2-40b | 1xH100, 1xH200, 2xL40S, 4xL40S, 8xL40S | | fourcastnet | 1xA100, 1xH100, 1xL40S, 1xRTX | | gemma-2-2b-instruct | 1xA100, 1xA10G, 1xH100, 1xL40S | | gemma-2-9b-it | 1xA100, 1xA10G, 1xH100, 1xL40S, 2xT4 | | genmol | 1xA100, 1xA10G, 1xA6000, 1xH100, 1xL40S | | llama-2-13b-chat | 1xA100, 2xA100, 1xH100, 2xH100, 1xL40S, 2xL40S | | llama-2-70b-chat | 4xA100, 8xA100, 2xH100, 4xH100, 8xH100, 4xL40S | | llama-2-7b-chat | 1xA100, 2xA100, 1xH100, 2xH100, 1xL40S, 2xL40S | | llama-3-sqlcoder-8b | 1xA10G, 2xA10G, 4xA10G, 1xH100, 2xH100, 1xL40S, 2xL40S | | llama-3-swallow-70b-instruct-v0.1 | 2xA100, 4xA10G, 2xH100, 4xH100, 2xL40S | | llama-3-taiwan-70b-instruct | 2xA100, 4xA10G, 2xH100, 4xH100, 2xL40S | | llama-3.1-70b-instruct | 4xA100, 8xA100, 2xH100, 4xH100, 8xH100, 1xH200, 2xH200, 4xH200, 4xL40S | | llama-3.1-8b-base | 1xA100, 2xA100, 2xA10G, 4xA10G, 1xH100, 2xH100, 2xL40S | | llama-3.1-8b-instruct | 2xA10G, 1xH100, 1xL40S | | llama-3.1-nemoguard-8b-content-safety | 1xL40S, 4xL40S, 8xL40S | | llama-3.1-nemoguard-8b-topic-control | 1xL40S, 4xL40S, 8xL40S | | llama-3.1-nemotron-70b-instruct | 4xA100, 8xA100, 2xH100, 4xH100, 8xH100, 8xL40S | | llama-3.1-swallow-70b-instruct-v0.1 | 4xA100, 4xH100, 2xH200, 8xL40S | | llama-3.1-swallow-8b-instruct-v0.1 | 1xA10G, 1xL40S | | llama-3.2-11b-vision-instruct | 1xA100, 2xA100, 4xA10G, 8xA10G, 1xH100, 2xH100, 1xH200, 2xH200, 2xL40S, 4xL40S | | llama-3.2-90b-vision-instruct | 4xA100, 8xA100, 2xH100, 4xH100, 8xH100, 1xH200, 2xH200, 4xH200, 8xL40S | | llama-3.2-nv-embedqa-1b-v2 | 1xA100, 1xA10G, 1xH100, 1xL40S | | llama-3.2-nv-rerankqa-1b-v2 | 1xA100, 1xA10G, 1xH100, 1xL4, 1xL40S | | llama-3.3-70b-instruct | 8xA100, 8xH100, 4xH200, 8xL40S | | llama3-70b-instruct | 4xA100, 8xA10G, 4xH100, 8xH100, 8xL40S | | llama3-8b-instruct | 1xA100, 2xA100, 1xA10G, 2xA10G, 1xH100, 2xH100, 1xL40S, 2xL40S | | maisi | 1xA100, 4xA10G, 1xH100, 2xL40S | | mistral-7b-instruct-v0.3 | 1xA100, 2xA100, 2xA10G, 4xA10G, 1xH100, 2xH100, 1xL40S, 2xL40S | | mistral-nemo-12b-instruct | 1xA100, 2xA100, 8xA10G, 1xH100, 2xH100 | | mistral-nemo-minitron-8b-8k-instruct | 1xA100, 2xA100, 2xA10G, 4xA10G, 1xH100, 2xH100, 1xL40S, 2xL40S | | mixtral-8x7b-instruct-v01 | 2xA100, 4xA100, 8xA10G, 2xH100, 4xH100, 4xL40S | | molmim | 1xA10G, 1xL40S | | nemoguard-jailbreak-detect | 1xA100, 1xA10G, 1xH100, 1xL40S | | nv-embedqa-e5-v5 | 1xA100, 1xA10G, 1xH100, 1xL40S | | nv-embedqa-e5-v5-pb24h2 | 1xA100, 1xA10G, 1xH100, 1xL40S | | nv-embedqa-mistral-7b-v2 | 1xA100, 1xH100, 1xL40S | | nv-rerankqa-mistral-4b-v3 | 1xA100, 1xA10G, 1xH100, 1xL40S | | nv-yolox-page-elements-v1 | 1xA100, 1xA10G, 1xL40S | | nvclip | 1xA100, 1xA10G, 1xH100, 1xL40S | | paddleocr | 1xA100, 1xA10G, 1xT4 | | phi-3-mini-4k-instruct | 1xA100, 1xA10G, 1xH100, 1xL40S | | phind-codellama-34b-v2-instruct | 2xA100, 4xA100, 8xA10G, 2xH100, 4xH100, 4xL40S | | proteinmpnn | 1xA10G, 1xL40S | | qwen-2.5-72b-instruct | 1xA10G, 1xH100, 1xL40S | | qwen-2.5-7b-instruct | 1xA10G, 1xH100, 1xL40S | | rfdiffusion | 1xA100, 1xH100, 1xL40S | | starcoderbase-15b | 2xA10G, 4xA10G, 8xA10G, 2xH100, 2xL40S, 4xL40S, 8xL40S | | vista3d | 1xA100, 1xH100, 1xL40S, 1xRTX |
General Hardware Requirements¶
NVIDIA has general guidelines for hardware requirements and we stay inline with them. Please review the official docs but below is a brief summary:
- Disk: the base NIM docker images range from 4GB to 23GB. You will also need enough disk space to store the model weights (2GB per billion params is a good rule of thumb) and for
trtllm_buildableprofiles you will need 2-3x the amount of this value. - CPU: greater than or equal to 8 cores
- Memory: 32GB of RAM minimum for pre-optimized models. For
trtllm_buildableprofiles you will need considerably more RAM, consult the official documentation.
ソフトウェア要件¶
NVIDIA has guidelines on the software to install on the Kubernetes nodes that will be running the NIMs. The most critical aspects of their requirements are that the required versions of the NVIDIA and CUDA drivers are installed and functional. Docker is not required on Kubernetes nodes as containerd or CRI-O are sufficient.
Load NIM Templates¶
NIM Templates are built on top of the same infrastructure that provides Application Templates and Custom Metrics Templates. For a detailed explanation of all the configuration options for this tool, consult the dedicated toDO link documentation](./custom-templates-core-integration-job.md). All NIM templates and required execution environments will be installed automatically as part of the standard helm install process.
These templates reference the included NIM execution environment that contains the DRUM server that proxies requests between the user and the NIM container and overlays MLOps monitoring of requests and other functionality. The templates also define supported Runtime Parameters and recommended resource bundles along with optional custom.py implementations for specific models. If changes are required for any NIM, a new version of the datarobot-custom-templates image will need to be installed.
Configure Custom Models with GPU Inference¶
The NIMs functionality is built on top of the Custom Models platform and specifically requires the GPU support that was added in 10.0. See the documentation that was created as part of that release as it is still applicable here: toDO link Custom Models with GPU Inference](./custom-models-configuration.md#gpu-configuration). The main point to take from that guide is to define the appropriate resource bundles. The bundles are highly dependant on the customer hardware and the specific NIMs that are desired to run.
Configure Feature Flags¶
The ENABLE_NIM_MODELS feature flag must be turned on to utilize the NIM functionality. This feature-flag also has the following dependencies:
ENABLE_MLOPS: NIMs are part of MLOpsENABLE_CUSTOM_INFERENCE_MODEL: NIMs are built on the Custom Models PlatformENABLE_CUSTOM_MODEL_GPU_INFERENCE: NIMs require GPU accessENABLE_MLOPS_RESOURCE_REQUEST_BUNDLES: Dependency of the above flagENABLE_PUBLIC_NETWORK_ACCESS_FOR_ALL_CUSTOM_MODELS: Internet access is required to download model weightsENABLE_MLOPS_TEXT_GENERATION_TARGET_TYPE: Many of the NIMs are LLMs which require the textGeneration target type
Other GenAI Related Flags:
ENABLE_COMPLIANCE_DOCUMENTATIONENABLE_CUSTOM_MODEL_FEATURE_FILTERINGENABLE_CUSTOM_MODEL_GITHUB_CI_CDENABLE_CUSTOM_MODEL_PREDICT_RESPONSE_EXTRA_MODEL_OUTPUTENABLE_GENAI_EXPERIMENTATIONENABLE_MLOPS_ACTUALS_STORAGEENABLE_MMM_DATA_QUALITYENABLE_MMM_GLOBAL_MODELS_IN_MODEL_REGISTRYENABLE_MMM_VDB_DEPLOYMENT_TYPEENABLE_MODERATION_GUARDRAILS
NVIDIA Enterprise License and NGC API Key¶
Usage of NIMs requires an NVIDIA Enterprise License. Follow NVIDIA's documentation for how to create your NGC Account if you do not already have one.
Generate an API Key from NGC¶
Once you have an account, you can signin and browse to the Setup page accessible from the dropdown after clicking on your profile located in the top-right portion of the page. 以下の操作を行います。
- Click the
Generate API Keybutton. - Click
Generate Personal Keyin the top right of the page. - Fill in the form modal:
- Give the key a descriptive
Key Name. - Set the expiration to something appropriate to your IT policies.
- Select
Private RegistryandNGC Catalogfrom the dropdown in theServices Includedfield. - Click
Generate Personal Key - Be sure to record your key as you will not be able to view it again.
Save NGC Key into DataRobot Secure Configuration¶
The DataRobot NIM Templates use Secure Configuration to securely transmit the NGC Key from the DataRobot credentials store to the running NIM container. The credential is used to optionally pull the base Docker image (if it is not cached locally) and also used to pull down the model weights from NGC at startup (local caching not supported in 11.0).
You must create a new entry into the Secure configurations for the new type NGC API Token. The entry must be shared with at least Consumer level access to the organization or users that will be deploying NIMs.