Manage model resources¶
After creating a custom inference model, you can configure the resources the model consumes to facilitate smooth deployment and minimize potential environment errors in production.
You can monitor a custom model's resource allocation from the Assemble tab. The resource settings are listed below the deployment status.
To edit any resource settings, select the pencil icon (). Note that users can determine the maximum memory allocated for a model, but only organization admins can configure additional resource settings.
Warning
DataRobot recommends configuring resource settings only when necessary. When you configure the Memory setting below, you set the Kubernetes memory "limit" (the maximum allowed memory allocation); however, you can't set the memory "request" (the minimum guaranteed memory allocation). For this reason, it is possible to set the "limit" value too far above the default "request" value. An imbalance between the memory "request" and the memory usage allowed by the increased "limit" can result in the custom model exceeding the memory consumption limit. As a result, you may experience unstable custom model execution due to frequent eviction and relaunching of the custom model. If you require an increased Memory setting, you can mitigate this issue by increasing the "request" at the Organization level; for more information, contact DataRobot Support.
Configure the resource allocations that appear in the modal.
Resource | Description |
---|---|
Memory | Determines the maximum amount of memory that may be allocated for a custom inference model. If a model exceeds the allocated amount, it is evicted by the system. If this occurs during testing, the test is marked as a failure. If this occurs when the model is deployed, the model is automatically launched again by Kubernetes. |
Replicas | Sets the number of replicas executed in parallel to balance workloads when a custom model is running. Increasing the number of replicas may not result in better performance, depending on the custom model's speed. |
Resource | Description |
---|---|
Memory | Determines the maximum amount of memory that may be allocated for a custom inference model. If a model allocates more than the configured maximum memory value, it is evicted by the system. If this occurs during testing, the test is marked as a failure. If this occurs when the model is deployed, the model is automatically launched again by Kubernetes. |
Replicas | Sets the number of replicas executed in parallel to balance workloads when a custom model is running. Increasing the number of replicas may not result in better performance, depending on the custom model's speed. |
Network access | Configures the egress traffic of the custom model. Choose between no access or public access. |
Once you have fully configured the resource settings for a model, click Save. This creates a new version of the custom model with edited resource settings applied.