Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

GenAI feature considerations

When working with generative AI capabilities in DataRobot, consider the following. Note that as the product continues to develop, some considerations may change.

Trial users: See the considerations specific to the DataRobot free trial, including supported LLM base models.

General considerations

  • Fewer embeddings are supported through the UI than through the API.

  • If a multilingual dataset exceeds the limit associated with the multilingual model, DataRobot defaults to using the jinaai/jina-embedding-t-en-v1 embedding model.

  • There is no support for adding external/custom vector databases or custom LLMs through the UI.

  • When using LLMs, be aware of the vendor's model versioning and end-of-life schedules. As a best practice, use only endpoints that are generally available when deploying to production.

  • Chatting with a single LLM blueprint in the playground is the only place where previous chat history is taken into account. Comparison prompts and prompts submitted to custom models deployed from the playground do not include previous prompts (history) as context.

  • Note that an API key named [Internal] DR API Access for GenAI Experimentation is created for you when you access the playground or vector database in the UI.

LLM availability

The following table describes the availability of LLMs:

Type US cluster EU cluster
Azure OpenAI GPT-4
Azure OpenAI GPT-4 32k
Azure OpenAI GPT-3.5 Turbo 16k
Azure OpenAI GPT-3.5 Turbo*
Google Bison*
Amazon Titan*

* Available for trial users, cluster-dependent.

Playground considerations

  • Playground sharing is not supported; each user collaborating in a Use Case will see only the playgrounds they have created.

  • Each user can submit 1000 LLM prompts per day across all LLMs, where deleted prompts and responses are also counted. However, only successful prompt response pairs are counted and bring-your-own (BYO) LLM calls are not part of the count. Limits for trial users are different, as describer here.

Vector database considerations

The following sections describe considerations related to vector databases:

Supported dataset types

When uploading datasets for creating a vector database, the only supported format is zip. DataRobot then processes the .zip to create a .csv containing text columns with an associated reference ID (file path) column. The reference ID column is created automatically when the zip is uploaded. All files should be either in the root of the archive or in a single folder inside an archive. Using a folder tree hierarchy is not supported.

Regarding file types, DataRobot provides the following support:

  • .txt documents

  • PDF documents

    • Text-based PDFs are supported.
    • Image-based PDFs are not fully supported. That is, images are generally ignored but do not lead to errors.
    • Documents with mixed image and text content are supported; only the text is parsed.
    • Single documents consisting only of images result in empty documents and are ignored.
    • Datasets consisting of image-only documents (no text) are not processable.
  • Mixed PDF and .txt documents in a single dataset are supported.

Dataset limits

The global 1GB dataset limit is applied during vector database creation, after the text is extracted from the document. Additional dynamic limits are listed below:

  • jinaai/jina-embedding-t-en-v1: Supported to the 1GB global limit
  • sentence-transformers/all-MiniLM-L6-v2: Supported to the 650MB limit
  • Multilingual-e5-base: Supported to the 250MB limit
  • E5-base-v2: Supported to the 250 MB limit
  • E5-large-v2: Supported to the 100MB limit

Playground deployment considerations

Consider the following when registering and deploying LLMs from the playground:

  • Setting API keys through the DataRobot credential management system is supported. Those credentials are accessed as environment variables in a deployment.

  • Registration and deployment is supported for:

    • All base LLMs in the playground

    • LLMs with vector databases

  • Registration and deployment is not supported for draft blueprints.

  • The creation of a custom model version from an LLM Blueprint associated with a large vector database (500+ MB) can take a while. You can leave the model workshop while the model is created.

Trial user considerations

The following considerations apply only to DataRobot free trial users:

  • You can create up to 15 vector databases, computed across multiple Use Cases. Deleted vector databases are included in this count.

  • You can make 300 LLM API calls, where deleted prompts and responses are also counted. However, only successful prompt response pairs are counted.

  • “Bring-your-own” LLMs and vector databases are not available.

See also the section on LLM availability.


Updated February 22, 2024