Skip to content

Aryn OCR Deployment Guide

Starting 11.4 Datarobot allows to enable enhanced Optical Character Recognition via Aryn Software. Datarobot installation bundle includes an image required to run Aryn service in any cloud. The service also requires GPU nodes available in Kubernetes.

Requirements

Minimal GPU requirements:

GPU Device GPU Memory RAM CPU Cores Storage VM Types
Nvidia T4/L4 16GiB 32GiB 8 200GiB Azure: Standard_NC4as_T4_v3
AWS: g4dn.2xlarge
Google Cloud: g2-standard-8

Recommended GPU nodes:

GPU Device GPU Memory RAM CPU Cores Storage VM Types
Nvidia T4/L4 24GiB 64GiB 16 200GiB Azure: Standard_NC16as_T4_v3
AWS: g6.4xlarge
Google Cloud: g2-standard-16

Optional: Labeled Nodes

Once Aryn is launched no other pod can share the node with Aryn installation. It is important that Aryn won't be launched on a high-cost GPU like H100 or similar, unless required. Thus, it is recommended to label node(s) to be used to launch Aryn, for example:

kubectl label nodes <your-node-name> datarobot.com/node-capability=aryn-gpu
or make sure that the nodes are provisioned with the label.

Enablement

Override configuration with:

global:
  aryn_ocr:
    enabled: true

  ocr-service:
    enabled: true

  aryn-ocr:
    # Set selector if nodes where specifically labeled based on `Optional: Dedicated Nodes` section
    nodeSelector:     
      "datarobot.com/node-capability": aryn-gpu
    nodeAffinity:
      enabled: false

Extras

Aryn OCR service configuration also supports taints and tolerations for better node allocation to make sure that a labeled node will not be occupied by any other service: - Assign your node an additional taint: dedicated=aryn-ocr:NoExecute - Override configuration with:

global:
  aryn_ocr:
    enabled: true

  ocr-service:
    enabled: true

  aryn-ocr:
    # Set selector if nodes where specifically labeled based on `Optional: Labeled Nodes` section
    nodeSelector:     
      "datarobot.com/node-capability": aryn-gpu
    nodeAffinity:
      enabled: false
    tolerations:
      # Allow scheduling on nodes with GPU taint
      - key: "nvidia.com/gpu"
        operator: "Exists"
        effect: "NoExecute"
      - key: "dedicated"
        operator: "Equal"
        value: "aryn-ocr"
        effect: "NoExecute"