NIM Containers – Troubleshooting Appendix¶
See the Air‑Gap configuration guide.
Profile Not Found¶
Symptom¶
The container terminates during startup with a NoSuchKey error:
Selected profile: 74bfd8b2df5eafe452a9887637eef4820779fb4e1edb72a4a7a2a1a2d1e6480b (tensorrt_llm-a10g-bf16-tp1-pp1-throughput)
...
Exception: S3 GetObject failed: service error: NoSuchKey: The specified key does not exist.
Root Cause¶
The specified model profile is not present in the object storage bucket.
Resolution¶
- Copy the
Selected profilename from the log output, e.gtensorrt_llm-a10g-bf16-tp1-pp1-throughput - Follow the documented procedures to download and upload the required profiles:
- Download profiles
- Upload profiles
Lazy Instance Previously Poisoned¶
Symptom¶
pyo3_runtime.PanicException: Lazy instance has previously been poisoned
Root Cause / Resolution¶
- Endpoint is HTTP: NIM Containers require all communication to occur over HTTPS.
- Container Cannot Verify HTTPS Certificate: Ensure your Public CA or Private CA bundle is mounted to all DataRobot Kubernetes workloads. Refer to the Public CA configuration example.
MinIO Connection Error¶
Symptom¶
Exception: S3 GetObject failed: dispatch failure: io error: error trying to connect:
dns error: failed to lookup address information: Name or service not known: dns error
Root Cause¶
NIM supports only the S3 virtual-hosted style (<bucket>.<domain>). As a result, MinIO must be configured to accept wildcard hosts.
Resolution¶
Option A – Wildcard Configuration (Recommended)
1. Configure DNS to support *.minio.internal-example.net.
2. Ensure the TLS certificate includes the Subject Alternative Name (SAN) for *.minio.internal-example.net.
3. Update ingress rules to allow wildcard hosts.
4. Set the domain name on the MinIO server using the following environment variable:
env:
- name: MINIO_DOMAIN
value: minio.internal-example.net
Option B - Path‑style fallback
1. Create a bucket that matches the domain part (minio).
2. Configure NIM:
NIM_REPOSITORY_OVERRIDE=s3://minio/
AWS_ENDPOINT_URL=https://internal-example.net/
MINIO_DOMAIN environment variable to internal-example.net.
Errors During Model Profile Uploads¶
Double Bucket Name in URL¶
Symptom:
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL:
"https://nim-bucket.nim-bucket.minio.internal-example.net/nim%252Fmeta%252Fllama-3.2-1b-instruct%253Ahf-e9f8eff-nim1.5%252B%253Ffile%253DLICENSE.txt"
Resolution:
Ensure that the AWS_ENDPOINT_URL environment variable does not include the bucket name.
# Incorrect {: #incorrect }
export AWS_ENDPOINT_URL=https://nim-bucket.minio.internal-example.net/
# Correct {: #correct }
export AWS_ENDPOINT_URL=https://minio.internal-example.net/
Signature Does Not Match¶
Symptom:
boto3.exceptions.S3UploadFailedError: Failed to upload ...
An error occurred (SignatureDoesNotMatch) when calling the PutObject operation:
The request signature we calculated does not match the signature you provided.
Check your key and signing method.
Root cause: This error is typically caused by an incorrect access key or secret. Review the logs for HTTP 403 errors to identify authentication issues.