Skip to content

Distributed Feature Discovery

Distributed feature discovery is an optional feature and can be enabled on-demand. Datarobot currently supports the feature only on AWS installations. Datarobot uses AWS EMR Serverless 7.3.0 base docker image that currently exposed to multiple CVEs: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-730-common-vulnerabilities-exposures.html . Please, make sure that it complies with company policies.

Dependencies

  • All requirements are satisfied to enable Spark Batch Feature in compute spart service
  • The image is presented in ECR as described here
  • Configure env as shown in the table below

Configuration Values

To configure these options, refer to the Tuning Datarobot Environment Variables section of this guide.

Config Description Default
CSP_SPARK_SAFER_CUSTOM_IMAGE_HOST ECR registry used to store spark-batch-image. Example: 11122233334444.dkr.ecr.us-east-1.amazonaws.com None
CSP_SPARK_SAFER_CUSTOM_IMAGE_REPO Repository name in ECR spark-batch-image
ENABLE_SAFER_DISTRIBUTED_MODE Flag to enable distributed feature discovery false

Enable feature

Feature can be enabled per user:

  • Login to application
  • Click on Settings icon
  • Select Feature Access
  • Search for Enable Feature Discovery in Distributed Mode and make sure it's enabled.