# Availability information

> Availability information - The following sections describe support for the various elements that are
> part of GenAI model creation, including LLMs, embeddings, sharing and permissions, and supported
> dataset types.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-05-01T23:10:48.102327+00:00` (UTC).

## Primary page

- [Availability information](https://docs.datarobot.com/en/docs/reference/gen-ai-ref/llm-availability.html): Full documentation for this topic (HTML).

## Sections on this page

- [LLM availability](https://docs.datarobot.com/en/docs/reference/gen-ai-ref/llm-availability.html#llm-availability): In-page section heading.
- [Amazon Bedrock](https://docs.datarobot.com/en/docs/reference/gen-ai-ref/llm-availability.html#amazon-bedrock): In-page section heading.
- [Azure OpenAI](https://docs.datarobot.com/en/docs/reference/gen-ai-ref/llm-availability.html#azure-openai): In-page section heading.
- [Google VertexAI](https://docs.datarobot.com/en/docs/reference/gen-ai-ref/llm-availability.html#google-vertexai): In-page section heading.
- [Anthropic](https://docs.datarobot.com/en/docs/reference/gen-ai-ref/llm-availability.html#anthropic): In-page section heading.
- [Cerebras](https://docs.datarobot.com/en/docs/reference/gen-ai-ref/llm-availability.html#cerebras): In-page section heading.
- [TogetherAI](https://docs.datarobot.com/en/docs/reference/gen-ai-ref/llm-availability.html#togetherai): In-page section heading.
- [Deprecated and retired LLMs](https://docs.datarobot.com/en/docs/reference/gen-ai-ref/llm-availability.html#deprecated-and-retired-llms): In-page section heading.
- [Embeddings availability](https://docs.datarobot.com/en/docs/reference/gen-ai-ref/llm-availability.html#embeddings-availability): In-page section heading.
- [Sharing and permissions](https://docs.datarobot.com/en/docs/reference/gen-ai-ref/llm-availability.html#sharing-and-permissions): In-page section heading.
- [Supported dataset types](https://docs.datarobot.com/en/docs/reference/gen-ai-ref/llm-availability.html#supported-dataset-types): In-page section heading.
- [Vector database formats](https://docs.datarobot.com/en/docs/reference/gen-ai-ref/llm-availability.html#vector-database-formats): In-page section heading.
- [Evaluation datasets](https://docs.datarobot.com/en/docs/reference/gen-ai-ref/llm-availability.html#evaluation-datasets): In-page section heading.

## Related documentation

- [Reference documentation](https://docs.datarobot.com/en/docs/reference/index.html): Linked from this page.
- [GenAI reference](https://docs.datarobot.com/en/docs/reference/gen-ai-ref/index.html): Linked from this page.
- [DataRobot free trial](https://docs.datarobot.com/en/docs/reference/gen-ai-ref/genai-consider.html#trial-user-considerations): Linked from this page.
- [brief descriptions](https://docs.datarobot.com/en/docs/reference/gen-ai-ref/llm-reference.html): Linked from this page.
- [building LLM blueprints](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/build-llm-blueprints.html#select-an-llm): Linked from this page.
- [LLM gateway service](https://docs.datarobot.com/en/docs/agentic-ai/genai-code/dr-llm-gateway.html): Linked from this page.
- [build and validate an external LLM integration](https://docs.datarobot.com/en/docs/agentic-ai/genai-code/ext-llm.html): Linked from this page.
- [Bolt-on Governance API](https://docs.datarobot.com/en/docs/api/code-first-tools/drum/structured-custom-models.html#chat): Linked from this page.
- [LLM blueprintstab](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/compare-llm.html#llm-blueprints-tab): Linked from this page.
- [evaluation metrics](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/playground-eval-metrics.html#configure-evaluation-metrics): Linked from this page.
- [metadata filtering](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/rag-chatting.html#metadata-filtering): Linked from this page.
- [use the Python API client](https://docs.datarobot.com/en/docs/api/reference/public-api/features.html#create-ocr-job-resource): Linked from this page.

## Documentation content

# Availability information

The following sections describe support for the various elements that are part of GenAI model creation:

- LLM availability , including deprecation and retirement information.
- Embeddings availability , including multilingual language support.
- Sharing and permissions .
- Supported dataset types .

> [!TIP] Trial users
> See the considerations specific to the [DataRobot free trial](https://docs.datarobot.com/en/docs/reference/gen-ai-ref/genai-consider.html#trial-user-considerations).

See also, for reference, [brief descriptions](https://docs.datarobot.com/en/docs/reference/gen-ai-ref/llm-reference.html) of each available LLM.

## LLM availability

Note the following when working with LLMs and the LLM gateway:

**LLM gateway availability:**
Availability of the LLM gateway is based on your pricing package. When enabled, the specific LLMs available via the LLM gateway are ultimately controlled by the organization administrator. If you see an LLM listed below but do not see it as a selection option when [building LLM blueprints](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/build-llm-blueprints.html#select-an-llm), contact your administrator. See also the [LLM gateway service](https://docs.datarobot.com/en/docs/agentic-ai/genai-code/dr-llm-gateway.html) documentation for information on the DataRobot API endpoint that can be used to interface with external LLM providers. LLM availability through the LLM gateway service is restricted to non-government regions.

To integrate with LLMs not available through the LLM gateway service, see the notebook that outlines how to [build and validate an external LLM integration](https://docs.datarobot.com/en/docs/agentic-ai/genai-code/ext-llm.html) using the DataRobot Python client.

**Rate limits:**
Depending on the configuration, your organization may be subject to rate limits on total number of chat completion calls. If the application returns a message that your maximum has been reached, it will reset in 24 hours. The time to reset is indicated in the error message. To remove the limit, contact your administrator or DataRobot representative to manage your organization's pricing plan.

**For org admins:**
All LLMs that are part of the LLM gateway are disabled by default and can only be enabled by the organization administrator. To enable an LLM for a user or org, search for `LLM_` in the Feature access page; it will return the full list of available LLMs. These LLMs are supported for production usage in the DataRobot platform.

Additionally, an org admin can toggle `Enable Fast-Track LLMs`, also in Feature access, to gain access to the newest LLMs from external LLM providers. These LLMs have not yet gone through the full DataRobot testing and approval process and are not recommended for production usage.

**Region availability for self-managed:**
Provider region availability information applies only to DataRobot's managed multi-tenant SaaS environments. It is not relevant for self-hosted (single-tenant SaaS, VPC, and on-premise) deployments where the provider region is dependent on the installation configuration.


The following tables list LLMs by provider.

- Amazon Bedrock
- Azure OpenAI
- Google VertexAI
- Anthropic
- Cerebras

- TogetherAI

In the tables below, which lists LLM availability by provider, note the following:

| Indicator | Explanation |
| --- | --- |
| † | Due to EU regulations, model access is disabled for Cloud users on the EU platform. |
| ‡ | Due to JP regulations, model access is disabled for Cloud users on the JP platform. |
| Δ | The model ID the playground uses for calling the LLM provider's services. This value is also the recommended value for the model parameter when using the Bolt-on Governance API for deployed LLM blueprints. |
| © | Meta Llama is licensed under the Meta Llama 4 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved. |

#### Amazon Bedrock

#### Azure OpenAI

#### Google VertexAI

#### Anthropic

#### Cerebras

#### TogetherAI

### Deprecated and retired LLMs

In the quickly advancing agentic AI landscape, LLMs are constantly improving, with new versions replacing older models. To address this, DataRobot's LLM deprecation process marks LLMs and LLM blueprints with a badge to indicate upcoming changes.Note that retirement dates are set by the provider and are subject to change.

The following LLMs are currently, or will soon be, deprecated and removed:

**Amazon Bedrock:**
LLM
Retirement date
Anthropic Claude 2.1
Retired
Anthropic Claude 3 Sonnet
Retired
Anthropic Claude Opus 3
Retired
Anthropic Claude Sonnet 3.5 v1
Retired
Anthropic Claude Sonnet 3.5 v2
Retired
Anthropic Claude Sonnet 3.7 v1
2026-04-28
Anthropic Claude Haiku 3.5 v1
2026-06-19
Anthropic Claude Opus 4
Retired
Cohere Command Light Text v14
Retired
Cohere Command Text v14
Retired
Titan
Retired

**Azure OpenAI:**
LLM
Retirement date
GPT-3.5 Turbo
Retired
GPT-3.5 Turbo 16k
Retired
GPT-4
Retired
GPT-4 32k
Retired
GPT-4 Turbo
Retired
GPT-4o Mini
Retired
o1-mini
Retired

**Google VertexAI:**
LLM
Retirement date
Bison
Retired
Gemini 3 Pro Preview
2026-03-26
Gemini 2.0 Flash
2026-06-01
Gemini 2.0 Flash Lite
2026-06-01
Gemini 1.5 Flash
Retired
Gemini 1.5 Pro
Retired
Claude Opus 3
Retired
Claude Sonnet 3.5
Retired
Claude Sonnet 3.5 v2
Retired
Claude Sonnet 3.7
2026-05-11
Claude Haiku 3.5
2026-07-05
Claude Haiku 3
2026-08-23
Claude Opus 4
2026-09-13
Claude Opus 4.1
2026-09-13
Mistral CodeStral 2501
Retired
Mistral Large 2411
Retired
Meta Llama 4 Maverick
Retired

**Anthropic:**
LLM
Retirement date
Claude Opus 3
Retired
Claude Sonnet 3.5 v1
Retired
Claude Sonnet 3.5 v2
Retired
Claude Sonnet 3.7
Retired
Claude Haiku 3
Retired
Claude Haiku 3.5
Retired
Claude Opus 4
2026-09-13
Claude Opus 4.1
2026-09-13

**Cerebras:**
LLM
Retirement date
Cerebras Qwen 3 32B
Retired
Cerebras Llama 3.3 70B
Retired

**TogetherAI:**
LLM
Retirement date
Arcee AI Virtuoso Large
Retired
Arcee AI Coder-Large
Retired
Arcee AI Maestro Reasoning
Retired
Meta Llama 3 70B Instruct Reference
Retired
Meta Llama 3.1 405B Instruct Turbo
Retired
Meta Llama 4 Scout Instruct
Retired
Mistral (7B) Instruct
Retired
Mistral (7B) Instruct v0.3
Retired
Mistral (7B) Instruct v0.2
Retired
Marin Community Marin 8B Instruct
Retired
Meta Llama 3 8B Instruct Lite
Retired
Meta Llama 3.1 8B Instruct Turbo
Retired
Meta Llama 3.2 3B Instruct Turbo
Retired
Meta Llama 4 Maverick Instruct
Retired
Mistral Small 3 Instruct (24B)
Retired
Mistral Mixtral-8x7B Instruct v0.1
Retired


To help protect experiments and deployments from unexpected removal of provider support, badges for deprecated LLMs are shown in the LLM blueprint creation panel:

Or if built, affected LLM blueprints are marked with a warning or notice, with dates provided on hover:

**LLM selection:**
When selecting an LLM for building LLM blueprints, in the selection panel LLMs are marked with a Deprecate badge to indicate that the end of support date for the LLM falls within 90 days.

[https://docs.datarobot.com/en/docs/images/llm-select-button.png](https://docs.datarobot.com/en/docs/images/llm-select-button.png)

**LLM blueprints:**
Once LLM blueprints are built, they are displayed in the [LLM blueprintstab](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/compare-llm.html#llm-blueprints-tab). Deprecated or retired LLM blueprints are marked with a warning or notice, with dates provided on hover:

[https://docs.datarobot.com/en/docs/images/llm-deprecate-badge.png](https://docs.datarobot.com/en/docs/images/llm-deprecate-badge.png)


- When an LLM is in thedeprecationprocess, support for the LLM will be removed in 90 days. Badges and warnings are present, but functionality is not restricted.
- Whenretired, assets created from the retired model are still viewable, but the creation of new assets is prevented. Retired LLMs cannot be used in single or comparison prompts.

Some [evaluation metrics](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/playground-eval-metrics.html#configure-evaluation-metrics), for example faithfulness and correctness, use an LLM in their configuration. For those, messages are displayed when viewing or configuring the metrics, as well as in the prompt response.

If an LLM has been deployed, because DataRobot does not have control over the credentials used for the underlying LLM, the deployment will fail to return predictions. If this happens, replace the deployed LLM with a new model.

## Embeddings availability

DataRobot supports the following types of embeddings for encoding data; all are transformer models trained on a mixture of supervised and unsupervised data.

| Embedding type | Description | Language |
| --- | --- | --- |
| cl-nagoya/sup-simcse-ja-base | A medium-sized language model from the Nagoya University Graduate School of Informatics ("Japanese SimCSE Technical Report"). It is a fast model for Japanese RAG.Input Dimension*: 512Output Dimension: 768Number of Parameters: 110M | Japanese |
| huggingface.co/intfloat/multilingual-e5-base | A medium-sized language model from Microsoft Research ("Weakly-Supervised Contrastive Pre-training on large MultiLingual corpus") used for multilingual RAG performance across multiple languages. Input Dimension*: 512Output Dimension: 768Number of parameters: 278M | 100+, see ISO 639 |
| huggingface.co/intfloat/multilingual-e5-small | A smaller-sized language model from Microsoft Research ("Weakly-Supervised Contrastive Pre-training on large MultiLingual corpus") used for multilingual RAG performance with faster performance than the MULTILINGUAL_E5_BASE. This embedding model is good for low-latency applications. Input Dimension*: 512Output Dimension: 384Number of parameters: 118M | 100+, see ISO 639 |
| intfloat/e5-base-v2 | A medium-sized language model from Microsoft Research ("Weakly-Supervised Contrastive Pre-training on large English Corpus") for medium-to-high RAG performance. With fewer parameters and a smaller architecture, it is faster than E5_LARGE_V2. Input Dimension*: 512Output Dimension: 768Number of parameters: 110M | English |
| intfloat/e5-large-v2 | A large language model from Microsoft Research ("Weakly-Supervised Contrastive Pre-training on large English Corpus") designed for optimal RAG performance. It is classified as slow due to its architecture and size.Input Dimension*: 512Output Dimension: 1024Number of parameters: 335M | English |
| jinaai/jina-embedding-t-en-v1 | A tiny language model trained using Jina AI's Linnaeus-Clean dataset. It is pre-trained on the English corpus and is the fastest, and default, embedding model offered by DataRobot. Input Dimension*: 512Output Dimension: 384Number of parameters: 14M | English |
| jinaai/jina-embedding-s-en-v2 | Part of the Jina Embeddings v2 family, this embedding model is the optimal choice for long-document embeddings (large chunk sizes, up to 8192). Input Dimension*: 8192Output Dimension: 384Number of parameters: 33M | English |
| sentence-transformers/all-MiniLM-L6-v2 | A small language model fine-tuned on a 1B sentence-pairs dataset. It is relatively fast and pre-trained on the English corpus. It is not recommend for RAG, however, as it was trained on old data. Input Dimension*: 256Output Dimension: 384Number of parameters: 33M | English |

* Input Dimension = `max_sequence_length`

## Sharing and permissions

The following table describes GenAI component-related user permissions. All roles (Consumer, Editor, Owner) refer to the user's role in the Use Case; access to various function are based on the Use Case roles. For example, because sharing is handled on the Use Case level, you cannot share only a vector database (vector databases do not define any sharing rules).

## Supported dataset types

The following describes requirements for vector databases and evalutation datasets.

### Vector database formats

When uploading datasets for use in creating a vector database, the supported formats are either `.zip` or `.csv`. Two columns are mandatory for the files— `document` and `document_file_path`. Additional metadata columns, up to 50, can be added for use in filtering during prompt queries. Note that for purposes of [metadata filtering](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/rag-chatting.html#metadata-filtering), `document_file_path` is displayed as `source`.

For `.zip` files, DataRobot processes the file to create a `.csv` version that contains text columns ( `document`) with an associated reference ID ( `document_file_path`) column. All content in the text column is treated as strings. The reference ID column is created automatically when the `.zip` is uploaded. All files should be either in the root of the archive or in a single folder inside an archive. Using a folder tree hierarchy is not supported.

Regarding file types, DataRobot provides the following support:

- .txtdocuments
- PDF documents
- .docxdocuments are supported but older.docformat is not supported.
- .mddocuments, and the.markdownvariant, are supported.
- A mix of all supported document types in a single dataset is allowed.

### Evaluation datasets

Evaluation datasets serve as reference data for evaluation metrics and aggregated metrics. The evaluation dataset must be:

- A CSV file.
- In the Data Registry.
- Have at least one text or categorical column.
