# Vector databases

> Vector databases - Vector databasesenable RAG (Retrieval-Augmented Generation) workflows by storing
> document embeddings and retrieving relevant context for LLM prompts. Vector databases allow you to
> create knowledge bases from your documents and use them to provide context-aware responses in your
> generative AI applications.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-04-24T16:03:56.280315+00:00` (UTC).

## Primary page

- [Vector databases](https://docs.datarobot.com/en/docs/api/dev-learning/python/genai/vector-databases.html): Full documentation for this topic (HTML).

## Sections on this page

- [Validate a deployment as a vector database](https://docs.datarobot.com/en/docs/api/dev-learning/python/genai/vector-databases.html#validate-a-deployment-as-a-vector-database): In-page section heading.
- [Get supported embeddings](https://docs.datarobot.com/en/docs/api/dev-learning/python/genai/vector-databases.html#get-supported-embeddings): In-page section heading.
- [Get supported text chunking configurations](https://docs.datarobot.com/en/docs/api/dev-learning/python/genai/vector-databases.html#get-supported-text-chunking-configurations): In-page section heading.
- [Create a vector database](https://docs.datarobot.com/en/docs/api/dev-learning/python/genai/vector-databases.html#create-a-vector-database): In-page section heading.
- [Update a vector database](https://docs.datarobot.com/en/docs/api/dev-learning/python/genai/vector-databases.html#update-a-vector-database): In-page section heading.
- [Link vector database to LLM blueprint](https://docs.datarobot.com/en/docs/api/dev-learning/python/genai/vector-databases.html#link-vector-database-to-llm-blueprint): In-page section heading.
- [List and manage vector databases](https://docs.datarobot.com/en/docs/api/dev-learning/python/genai/vector-databases.html#list-and-manage-vector-databases): In-page section heading.
- [Export a vector database as a dataset](https://docs.datarobot.com/en/docs/api/dev-learning/python/genai/vector-databases.html#export-a-vector-database-as-a-dataset): In-page section heading.

## Related documentation

- [Developer documentation](https://docs.datarobot.com/en/docs/api/index.html): Linked from this page.
- [Developer learning](https://docs.datarobot.com/en/docs/api/dev-learning/index.html): Linked from this page.
- [Python API client user guide](https://docs.datarobot.com/en/docs/api/dev-learning/python/index.html): Linked from this page.
- [Generative AI](https://docs.datarobot.com/en/docs/api/dev-learning/python/genai/index.html): Linked from this page.
- [Vector databases](https://docs.datarobot.com/en/docs/api/reference/sdk/gen-vector-databases.html#datarobot.models.genai.vector_database.VectorDatabase): Linked from this page.

## Documentation content

# Vector databases

[Vector databases](https://docs.datarobot.com/en/docs/api/reference/sdk/gen-vector-databases.html#datarobot.models.genai.vector_database.VectorDatabase) enable RAG (Retrieval-Augmented Generation) workflows by storing document embeddings and retrieving relevant context for LLM prompts. Vector databases allow you to create knowledge bases from your documents and use them to provide context-aware responses in your generative AI applications.

## Validate a deployment as a vector database

Before using a deployment as a vector database, validate it using [datarobot.CustomModelVectorDatabaseValidation](https://docs.datarobot.com/en/docs/api/reference/sdk/gen-vector-databases.html#datarobot.models.genai.vector_database.CustomModelVectorDatabaseValidation):

```
import datarobot as dr
validation = dr.CustomModelVectorDatabaseValidation.create(
    deployment_id=deployment.id,
    name="My Vector Database",
    prompt_column_name="query",
    target_column_name="citations",
    wait_for_completion=True
)
if validation.validation_status == "PASSED":
    print("Vector database validation passed!")
```

## Get supported embeddings

View available embedding models using [datarobot.VectorDatabase.get_supported_embeddings()](https://docs.datarobot.com/en/docs/api/reference/sdk/gen-vector-databases.html#datarobot.models.genai.vector_database.VectorDatabase.get_supported_embeddings):

```
supported = dr.genai.VectorDatabase.get_supported_embeddings()
print(f"Default embedding model: {supported.default_embedding_model}")
for model in supported.embedding_models:
    print(f"  {model.name}: {model.description}")
```

You can also get recommended embeddings for a specific dataset:

```
dataset = dr.Dataset.get(dataset_id)
supported = dr.genai.VectorDatabase.get_supported_embeddings(dataset_id=dataset.id)
print(f"Default embedding: {supported.default_embedding_model}")
```

## Get supported text chunking configurations

To view available text chunking options:

```
chunking_configs = dr.genai.VectorDatabase.get_supported_text_chunkings()
for config in chunking_configs.text_chunking_configs:
    print(f"Chunking config: {config}")
```

## Create a vector database

Create a vector database from a dataset containing your documents. When creating a vector database, you should specify the following:

- dataset_id : The ID of the dataset used for creation.
- chunking_parameters : Parameters defining how documents are split and embedded, including embedding model, chunking method, chunk size, and overlap percentage.
- name : An optional user-friendly name for the vector database.
- use_case : An optional Use Case to link the vector database to.

```
dataset = dr.Dataset.upload("documents.csv")
supported = dr.genai.VectorDatabase.get_supported_embeddings(dataset_id=dataset.id)
from datarobot.models.genai.vector_database import ChunkingParameters
chunking_params = ChunkingParameters(
    embedding_model=supported.default_embedding_model,
    chunking_method="semantic",
    chunk_size=500,
    chunk_overlap_percentage=10
)
vector_db = dr.genai.VectorDatabase.create(
    dataset_id=dataset.id,
    name="Document Knowledge Base",
    chunking_parameters=chunking_params,
    use_case=use_case_id
)
vector_db
```

## Update a vector database

Add more documents or update an existing vector database:

```
new_dataset = dr.Dataset.upload("updated_documents.csv")
updated_vector_db = dr.genai.VectorDatabase.create(
    dataset_id=new_dataset.id,
    parent_vector_database_id=vector_db.id,
    update_llm_blueprints=True
)
```

## Link vector database to LLM blueprint

Associate a vector database with an LLM blueprint for RAG:

```
vector_db = dr.genai.VectorDatabase.get(vector_db_id)
blueprint = dr.genai.LLMBlueprint.get(blueprint_id)
blueprint.update(
    vector_database=vector_db.id,
    vector_database_settings={
        "max_documents_retrieved_per_prompt": 3
    }
)
```

## List and manage vector databases

List all vector databases:

```
all_dbs = dr.genai.VectorDatabase.list()
print(f"Found {len(all_dbs)} vector database(s):")
for db in all_dbs:
    print(f"  - {db.name} (ID: {db.id}, Status: {db.execution_status})")
```

Filter vector databases by Use Case:

```
use_case = dr.UseCase.get(use_case_id)
use_case_dbs = dr.genai.VectorDatabase.list(use_case=use_case)
```

Get vector database details:

```
vector_db = dr.genai.VectorDatabase.get(vector_db_id)
print(f"Name: {vector_db.name}")
print(f"Size: {vector_db.size} bytes")
print(f"Status: {vector_db.execution_status}")
print(f"Chunks count: {vector_db.chunks_count}")
```

Delete a vector database:

```
vector_db.delete()
```

## Export a vector database as a dataset

To export a vector database as a dataset:

```
vector_db = dr.genai.VectorDatabase.get(vector_db_id)
export_job = vector_db.submit_export_dataset_job()
exported_dataset_id = export_job.dataset_id
```
