Aryn OCR Deployment Guide¶
Starting 11.4 Datarobot allows to enable enhanced Optical Character Recognition via Aryn Software. Datarobot installation bundle includes an image required to run Aryn service in any cloud. The service also requires GPU nodes available in Kubernetes.
Requirements¶
Minimal GPU requirements:
| GPU Device | GPU Memory | RAM | CPU Cores | Storage | VM Types |
|---|---|---|---|---|---|
| Nvidia T4/L4 | 16GiB | 32GiB | 8 | 200GiB | Azure: Standard_NC4as_T4_v3 AWS: g4dn.2xlarge Google Cloud: g2-standard-8 |
Recommended GPU nodes:
| GPU Device | GPU Memory | RAM | CPU Cores | Storage | VM Types |
|---|---|---|---|---|---|
| Nvidia T4/L4 | 24GiB | 64GiB | 16 | 200GiB | Azure: Standard_NC16as_T4_v3 AWS: g6.4xlarge Google Cloud: g2-standard-16 |
Optional: Labeled Nodes¶
Once Aryn is launched no other pod can share the node with Aryn installation. It is important that Aryn won't be launched on a high-cost GPU like H100 or similar, unless required. Thus, it is recommended to label node(s) to be used to launch Aryn, for example:
kubectl label nodes <your-node-name> datarobot.com/node-capability=aryn-gpu
Enablement¶
Override configuration with:
global:
aryn_ocr:
enabled: true
ocr-service:
enabled: true
aryn-ocr:
# Set selector if nodes where specifically labeled based on `Optional: Dedicated Nodes` section
nodeSelector:
"datarobot.com/node-capability": aryn-gpu
nodeAffinity:
enabled: false
Extras¶
Aryn OCR service configuration also supports
taints and tolerations
for better node allocation to make sure that a labeled node will not be occupied by any other service:
- Assign your node an additional taint: dedicated=aryn-ocr:NoExecute
- Override configuration with:
global:
aryn_ocr:
enabled: true
ocr-service:
enabled: true
aryn-ocr:
# Set selector if nodes where specifically labeled based on `Optional: Labeled Nodes` section
nodeSelector:
"datarobot.com/node-capability": aryn-gpu
nodeAffinity:
enabled: false
tolerations:
# Allow scheduling on nodes with GPU taint
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoExecute"
- key: "dedicated"
operator: "Equal"
value: "aryn-ocr"
effect: "NoExecute"