# Best practices and troubleshooting

> Best practices and troubleshooting - Container design, production deployments, security, and common
> issues.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-05-06T18:17:09.609877+00:00` (UTC).

## Primary page

- [Best practices and troubleshooting](https://docs.datarobot.com/en/docs/api/dev-learning/workload-api/best-practices.html): Full documentation for this topic (HTML).

## Sections on this page

- [Best practices](https://docs.datarobot.com/en/docs/api/dev-learning/workload-api/best-practices.html#best-practices-section): In-page section heading.
- [Container design](https://docs.datarobot.com/en/docs/api/dev-learning/workload-api/best-practices.html#container-design): In-page section heading.
- [Production deployments](https://docs.datarobot.com/en/docs/api/dev-learning/workload-api/best-practices.html#production-deployments): In-page section heading.
- [Security](https://docs.datarobot.com/en/docs/api/dev-learning/workload-api/best-practices.html#security): In-page section heading.
- [Troubleshooting](https://docs.datarobot.com/en/docs/api/dev-learning/workload-api/best-practices.html#troubleshooting): In-page section heading.
- [Checking workload status details](https://docs.datarobot.com/en/docs/api/dev-learning/workload-api/best-practices.html#checking-status-details): In-page section heading.
- [Common issues](https://docs.datarobot.com/en/docs/api/dev-learning/workload-api/best-practices.html#common-issues): In-page section heading.
- [Workload stuck in initializing status](https://docs.datarobot.com/en/docs/api/dev-learning/workload-api/best-practices.html#stuck-initializing): In-page section heading.
- [Workload enters errored status](https://docs.datarobot.com/en/docs/api/dev-learning/workload-api/best-practices.html#errored-status): In-page section heading.

## Related documentation

- [Developer documentation](https://docs.datarobot.com/en/docs/api/index.html): Linked from this page.

## Documentation content

This page covers recommended practices for container design, production deployments, security, and troubleshooting steps for common issues.

## Best practices

### Container design

- Implement health checks: Always provide liveness, readiness, and startup probes.
- Set appropriate timeouts: GPU-heavy workloads need longer startup times.
- Use appropriate resource requests: Right-size CPU and memory to avoid overprovisioning.

### Production deployments

- Promote artifacts: Lock artifacts before production deployment for immutability.
- Enable autoscaling: Configure scaling policies appropriate for your traffic patterns.
- Use resource bundles: Leverage predefined resource bundles for consistent GPU allocation.

### Security

- Use private registries: Store container images in secure, private registries.
- Avoid hardcoded secrets: Use environment variables or secret management.
- Implement HTTPS probes: Use scheme: HTTPS for health checks when appropriate.

## Troubleshooting

### Checking workload status details

When a workload enters an unexpected state, the `statusDetails` field provides diagnostic information:

```
curl -s -X GET "${DATAROBOT_ENDPOINT}/console/workloads/{workloadId}" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" | jq '.statusDetails'
```

The `statusDetails` object contains two key fields:

| Field | Description |
| --- | --- |
| conditions | Array of Kubernetes-style conditions indicating component states. |
| logTail | Array of recent container log lines captured during startup. |

### Common issues

#### Workload stuck in initializing status

- Check statusDetails.conditions for scheduling issues.
- Verify the container image is accessible from the cluster.
- Verify resource requests don't exceed cluster capacity.
- Review startup probe configuration.

#### Workload enters errored status

When a workload fails, start by inspecting `statusDetails`:

```
curl -s -X GET "${DATAROBOT_ENDPOINT}/console/workloads/{workloadId}" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" | jq '{status, statusDetails}'
```
