CustomTasks configurations¶
Required installation values: data transmission image pull secrets¶
The data transmission image is packaged with the app and should be part of the local Docker repo. In order to use that, you'll need the following settings.
CUSTOM_TASK_DATA_TRANSMISSION_IMAGE_NAME- description: The docker location for the custom-task-data-transmission image which handles upload and download during a CustomTask's fit action.
- default: docker.io/datarobot/custom-task-data-transmission:11.0.0-12d6cc08996ebcafc0f107e78d8fa4b270a40501-71
CUSTOM_TASK_DATA_TRANSMISSION_IMAGE_PULL_SECRET- description: This value lists the name for an already created V1Secret for image pull secrets that CustomTaskExecutions can use to pull down the static data transmission image from the repo it's hosted on.
- default: "datarobot-image-pullsecret"
These values must be set and should already be set to correct default values. If they are set to the correct values then you should be able to pull the image in the test pod.
Important installation values¶
These are the most likely values to be set on installation. The shared namespace defaults to the helm chart's namespace. In addition, you will almost certainly need to set up network policies in order for custom tasks to successfully run. See the network policies section for details.
CUSTOM_TASK_EXECUTION_SHARED_IMAGE_PULL_SECRET- description: This value lists the name for an already created V1Secret for image pull secrets that CustomTaskExecutions can use to configure V1Pods. This is used for execution images hosted on Image Build Sservice. If the value is empty, defaults to creating a secret name and V1Secret per execution at runtime.
- default: ''
CUSTOM_TASK_EXECUTION_SHARED_NAMESPACE- description: This value describes the namespace where all CustomTask fit jobs get assigned. If the value is empty the default behavior of one namespace per fit job will happen.
- default: "{{ .Release.Namespace }}"
CUSTOM_TASK_EXECUTION_NODE_SELECTOR_KEY- description: Run CustomTask fit on nodes with label: {this_value: CUSTOM_TASK_EXECUTION_NODE_SELECTOR_VALUE}. If CUSTOM_TASK_EXECUTION_NODE_SELECTOR_KEY or CUSTOM_TASK_EXECUTION_NODE_SELECTOR_VALUE is empty, will not add a node selector. If you input a node selector that doesn't exist, jobs will hang forever.
- default: ''
CUSTOM_TASK_EXECUTION_NODE_SELECTOR_VALUE- description: Run CustomTask fit on nodes with label: {CUSTOM_TASK_EXECUTION_NODE_SELECTOR_KEY: this_value}. If CUSTOM_TASK_EXECUTION_NODE_SELECTOR_KEY or CUSTOM_TASK_EXECUTION_NODE_SELECTOR_VALUE is empty, will not add a node selector. If you input a node selector that doesn't exist, jobs will hang forever.
- default: ''
CUSTOM_TASK_EXECUTION_NODE_TOLERATION- description: Run CustomTask fit on nodes with specific taints. Empty values will not add node tolerations. Only one toleration allowed.
- default: ''
CUSTOM_TASK_EXECUTION_SERVICE_ACCOUNT_NAME- description: When this is set, it's the name of the service account that will be used to run pods during a custom task execution
- default: ''
All other settable values¶
Below are all the other values that can be set to control custom task settings. Most of these have sane defaults set and it's very rare that they would require additional settings
| Name | Description | Default |
|---|---|---|
CUSTOM_TASKS_MAX_NUMBER_OF_TASKS_IN_BP |
The maximum number of CustomTasks that can be in a single UserBlueprint. | 3 |
CUSTOM_TASK_EXECUTION_AUX_CPU_LIMIT |
The CPU limit for containers that transfer user data to/from DataRobot | 1 |
CUSTOM_TASK_EXECUTION_AUX_CPU_REQUEST |
The CPU request for containers that transfer user data to/from DataRobot | 0 |
CUSTOM_TASK_EXECUTION_AUX_MEM_LIMIT |
The memory limit for containers that transfer user data to/from DataRobot | 104857600 |
CUSTOM_TASK_EXECUTION_AUX_MEM_REQUEST |
The memory request for containers that transfer user data to/from DataRobot | 104857600 |
CUSTOM_TASK_EXECUTION_CPU_LIMIT |
The CPU limit for the container running a user's CustomTask fit code. | 4 |
CUSTOM_TASK_EXECUTION_CPU_REQUEST |
The requested CPU for the container running a user's CustomTask fit code. | 0 |
CUSTOM_TASK_EXECUTION_IMAGE_BUILD_TIMEOUT |
null | 1200 |
CUSTOM_TASK_EXECUTION_MAX_ARTIFACT_SIZE |
The maximum size for artifacts created by a user's CustomTask fit code. | 10737418240 |
CUSTOM_TASK_EXECUTION_MEM_LIMIT |
The memory limit for the container running a user's CustomTask fit code. | 4294967296 |
CUSTOM_TASK_EXECUTION_MEM_REQUEST |
The requested memory for the container running a user's CustomTask fit code. | 134217728 |
CUSTOM_TASK_EXECUTION_STORAGE_SIZE |
The maximum allowed disk space usage for all artifacts generated by a user's CustomTask fit code. This does not include storage of the training data which is handled separately by DataSet code. | 16106127360 |
CUSTOM_TASK_EXECUTION_TIMEOUT |
In seconds, the max time for the container running a user's CustomTask fit code. | 7200 |
| CUSTOM_TASK_FIT DYNAMIC_MEMORY_LIMIT_BUFFER | If measuring memory usage during fit to optimize an LRS, the amount of extra memory added beyond that value when creating the lrs. | 1073741824 |
CUSTOM_TASK_FIT_START_TIMEOUT |
The maximum time in seconds to wait for a CustomTask fit job to start. This includes creating the request, updating mongo, building all images (including possibly creating new ExecutionEnvironmentVersions) and starting the K8s V1Job. | 10800 |
| CUSTOM_TASK_LRS MAX_RATE_LIMIT_EXCEEDED_WAIT | The maximum time in seconds to wait for an LRS to become available for the user. Each user has a maximum number of allowed LRSes that can be up at any one time. | 10800 |
CUSTOM_TASK_PREDICT_MEM_LIMIT |
The memory limit for an LRS for a Custom(Training)Task as opposed to a Custom(Inference)Model. | 4294967296 |
CUSTOM_TASK_VERSION_MAX_FILES |
Maximum number of files allowed to upload to a custom task version. Minimum of 10 is to reflect that we support metadata files including input/output schema and hyperparameters, so it should not be 1. | 100 |
ENABLE_CUSTOM_TASK_HYPERPARAMETERS |
null | False |
EPHEMERAL_LRS_CPU_REQUEST |
The amount of CPU resource for an LRS that is spun up when training a CustomTask or any other ephemeral LRS. Replaces: CUSTOM_MODEL_EPHEMERAL_PREDICT_CPU_REQUEST | 0.5 |
EPHEMERAL_LRS_MAXIMUM_LIFETIME |
The maximum lifetime, in seconds, for an ephemeral LRS. When an LRS is created for any kind of scoring during any kind of fit job, this is the amount of time until that LRS is automatically cleaned up. | 86400 |
EPHEMERAL_LRS_MEM_REQUEST |
The amount of memory resource for an LRS that is spun up when training a CustomTask or any other ephemeral LRS. Replaces: CUSTOM_TASK_EPHEMERAL_MEM_REQUEST | 134217728 |
| IMAGE_BUILDER_CUSTOM_TASK EXECUTION_REGISTRY_REPO | The repo where CustomTask docker images are stored. If unset, defaults to IMAGE_BUILDER_CUSTOM_MODELS_REGISTRY_REPO | 'managed-image' |
Setting up network policies¶
A default network policy for custom tasks to interact with DataRobot is configured automatically. In some situations additional policies might be desired such as a deny all policy. Examples are provided below.
All custom tasks have the label
task-type=custom-task-fit and all tests and network policies need this label.
Testing connectivity¶
You may not need to configure any network policies at all. Here is a quick test pod to shell into.
in the pod, shell into it and try curl http://datarobot-nginx:80/config replacing with LB URL.
apiVersion: v1
kind: Pod
metadata:
labels:
datarobot-version: 10.2.0
task-type: custom-task-fit # MUST BE SET LIKE THIS!
name: connectivity-test
namespace: dr-app-charts-r10p2-air-gap-daily # SET TO CORRECT NAMESPACE
spec:
containers:
- command:
- sh
- -c
- sleep 500
image: docker.io/datarobot/custom-task-data-transmission:11.0.0-12d6cc08996ebcafc0f107e78d8fa4b270a40501-71 # set to the value of CUSTOM_TASK_DATA_TRANSMISSION_IMAGE_NAME
imagePullPolicy: IfNotPresent
name: generic-container
securityContext:
allowPrivilegeEscalation: false
privileged: false
runAsGroup: 2501
runAsNonRoot: true
runAsUser: 2500
volumeMounts:
- mountPath: /mnt
name: data-store
dnsPolicy: ClusterFirst
enableServiceLinks: true
imagePullSecrets:
- name: datarobot-image-pullsecret
- name: datarobot-image-pull-secret
securityContext:
fsGroup: 4000
serviceAccountName: default
volumes:
- emptyDir: {}
name: data-store
Example network policies¶
All custom tasks have the label: task-type=custom-task-fit. Y
ou may want to set up a default deny all policy or explicit access to public ips.
You will always need to use the same pod selector with task-type=custom-task-fit.
- A deny all policy that blocks all traffic to and from the custom task:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: custom-task-deny-all
namespace: awesome-datarobot
spec:
podSelector:
matchLabels:
task-type: custom-task-fit
policyTypes:
- Ingress
- Egress
- A public policy that allows traffic to the internet:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: custom-task-public-access
namespace: awesome-datarobot
spec:
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
podSelector:
matchLabels:
egress-network-access: public
task-type: custom-task-fit
policyTypes:
- Egress
Restricted network installation guide¶
Setting up in a restricted network requires: