Skip to content

NVIDIA NIM Configuration

Part of NVIDIA AI Enterprise, NVIDIA NIM microservices are a set of easy-to-use microservices for accelerating the deployment of foundation models on any cloud or data center and helps keep your data secure. NIM microservices have production-grade runtimes including on-going security updates.

DataRobot runs these NIM microservices on top of the Custom Models infrastructure so we can layer on our Bolt-on-Governance APIs.

Dependencies

Supported NIMs

Since the 11.0 release, we support a subset of the NIMs that NVIDIA provides. For a full list of all NIMs that are available and also more detailed hardware requirements consult the upstream documentation. However, below is the list of NIMs that DataRobot supports, along with recommended GPU requirements.

Customers are allowed to attempt to launch NIMs on non-recommended GPU/hardware configurations but they may fail after hitting the startup timeout (configurable).

GPU Recommendations

See the recommendations table below.

Validated Model Deployments by Cloud Provider (Minimum GPU Resource Bundle)

See the validated model deployments by cloud provider.

General Hardware Requirements

NVIDIA has general guidelines for hardware requirements and we stay inline with them. Please review the official docs but below is a brief summary:

  • Disk: the base NIM docker images range from 4GB to 23GB. You will also need enough disk space to store the model weights (2GB per billion params is a good rule of thumb) and for trtllm_buildable profiles you will need 2-3x the amount of this value.
  • CPU: greater than or equal to 8 cores
  • Memory: 32GB of RAM minimum for pre-optimized models. For trtllm_buildable profiles you will need considerably more RAM, consult the official documentation.

Software Requirements

NVIDIA has guidelines on the software to install on the Kubernetes nodes that will be running the NIMs. The most critical aspects of their requirements are that the required versions of the NVIDIA and CUDA drivers are installed and functional. Docker is not required on Kubernetes nodes as containerd or CRI-O are sufficient.

Upgrading the execution environment

An execution environment DRUM NIM Sidecar is used to facilitate the NVIDIA NIM integration.

Note! When upgrading DataRobot versions, this execution environment is not automatically upgraded to the latest versions. In order to receive the latest updates it is necessary to ensure that the execution environment upgrade step is carried out during the DataRobot upgrade procedure.

Load NIM Templates

NIM Templates are built on top of the same infrastructure that provides Application Templates and Custom Metrics Templates. For a detailed explanation of all the configuration options for this tool, consult the dedicated documentation. All NIM templates and required execution environments will be installed automatically as part of the standard helm install process.

These templates reference the included NIM execution environment that contains the DRUM server that proxies requests between the user and the NIM container and overlays MLOps monitoring of requests and other functionality. The templates also define supported Runtime Parameters and recommended resource bundles along with optional custom.py implementations for specific models. If changes are required for any NIM, a new version of the datarobot-custom-templates image will need to be installed.

Configure Custom Models with GPU Inference

The NIMs functionality is built on top of the Custom Models platform and specifically requires the GPU support that was added in 10.0. See the documentation that was created as part of that release as it is still applicable here: Custom Models with GPU Inference. The main point to take from that guide is to define the appropriate resource bundles. The bundles are highly dependant on the customer hardware and the specific NIMs that are desired to run.

Configure Feature Flags

The ENABLE_NIM_MODELS feature flag must be turned on to utilize the NIM functionality. This feature-flag also has the following dependencies:

  • ENABLE_MLOPS: NIMs are part of MLOps
  • ENABLE_CUSTOM_INFERENCE_MODEL: NIMs are built on the Custom Models Platform
  • ENABLE_CUSTOM_MODEL_GPU_INFERENCE: NIMs require GPU access
  • ENABLE_MLOPS_RESOURCE_REQUEST_BUNDLES: Dependency of the above flag
  • ENABLE_PUBLIC_NETWORK_ACCESS_FOR_ALL_CUSTOM_MODELS: Internet access is required to download model weights
  • ENABLE_MLOPS_TEXT_GENERATION_TARGET_TYPE: Many of the NIMs are LLMs which require the textGeneration target type

Other GenAI Related Flags:

  • ENABLE_COMPLIANCE_DOCUMENTATION
  • ENABLE_CUSTOM_MODEL_FEATURE_FILTERING
  • ENABLE_CUSTOM_MODEL_GITHUB_CI_CD
  • ENABLE_CUSTOM_MODEL_PREDICT_RESPONSE_EXTRA_MODEL_OUTPUT
  • ENABLE_GENAI_EXPERIMENTATION
  • ENABLE_MLOPS_ACTUALS_STORAGE
  • ENABLE_MMM_DATA_QUALITY
  • ENABLE_MMM_GLOBAL_MODELS_IN_MODEL_REGISTRY
  • ENABLE_MMM_VDB_DEPLOYMENT_TYPE
  • ENABLE_MODERATION_GUARDRAILS

NVIDIA Enterprise License and NGC API Key

Usage of NIMs requires an NVIDIA Enterprise License. Follow NVIDIA's documentation for how to create your NGC Account if you do not already have one.

Generate an API Key from NGC

Once you have an account, you can signin and browse to the Setup page accessible from the dropdown after clicking on your profile located in the top-right portion of the page. From here:

  • Click the Generate API Key button.
  • Click Generate Personal Key in the top right of the page.
  • Fill in the form modal:
  • Give the key a descriptive Key Name.
  • Set the expiration to something appropriate to your IT policies.
  • Select Private Registry and NGC Catalog from the dropdown in the Services Included field.
  • Click Generate Personal Key
  • Be sure to record your key as you will not be able to view it again.

Save NGC Key into DataRobot Secure Configuration

The DataRobot NIM Templates use Secure Configuration to securely transmit the NGC Key from the DataRobot credentials store to the running NIM container. The credential is used to optionally pull the base Docker image (if it is not cached locally) and also used to pull down the model weights from NGC at startup (local caching not supported in 11.0).

You must create a new entry into the Secure configurations for the new type NGC API Token. The entry must be shared with at least Consumer level access to the organization or users that will be deploying NIMs.