Generative AI documentation > Code-based GenAI > Create and deploy a DataRobot vector database

Create and deploy a DataRobot vector database¶

This notebook demonstrates how to use the DataRobot Python SDK to create and deploy a vector database (VDB) using DataRobot's built-in embeddings. This is the simplest approach for creating vector databases and doesn't require creating custom models. If you need to use your own embedding model (BYO embeddings), see the Create vector databases from BYO embeddings notebook for a complete example. This example workflow does the following:

Creates a Use Case and uploads a dataset.
Configures chunking parameters for your documents.
Creates a vector database using DataRobot's built-in embedding models.
Deploys vector databases for use in production.

Note: For self-managed users, code samples that reference app.datarobot.com need to be changed to the appropriate URL for your instance.

Setup¶

Prerequisites¶

This workflow requires the following feature flags. Contact your DataRobot representative or administrator for information on enabling these features:

Enable MLOps
Enable Public Network Access for all Custom Models (Premium)
Enable Monitoring Support for Generative Models
Enable Custom Inference Models
Enable GenAI Experimentation

Import libraries¶

This section imports Python libraries needed to interact with DataRobot, configure document chunking, and manage the creation and deployment of vector databases. These libraries supply the necessary interfaces for connection, configuration, and orchestration of the workflow.

In [ ]:

Copied!





import datarobot as dr
from datarobot.models.genai.vector_database import VectorDatabase
from datarobot.models.genai.vector_database import ChunkingParameters
from datarobot.enums import VectorDatabaseEmbeddingModel
from datarobot.enums import VectorDatabaseChunkingMethod
from datarobot.enums import PredictionEnvironmentPlatform
from datarobot.enums import PredictionEnvironmentModelFormats
import time
import requests
import datarobot as dr
from datarobot.models.genai.vector_database import VectorDatabase
from datarobot.models.genai.vector_database import ChunkingParameters
from datarobot.enums import VectorDatabaseEmbeddingModel
from datarobot.enums import VectorDatabaseChunkingMethod
from datarobot.enums import PredictionEnvironmentPlatform
from datarobot.enums import PredictionEnvironmentModelFormats
import time
import requests

Connect to DataRobot¶

This section managed the connection to the DataRobot client. Read more about different options for connecting to DataRobot from the Python client.

In [ ]:

Copied!





# Option 1: Use environment variables (recommended)
# The client will automatically use DATAROBOT_ENDPOINT and DATAROBOT_API_TOKEN
dr.Client()

# Option 2: Explicitly provide endpoint and token
# endpoint = "https://app.datarobot.com/api/v2"
# token = "<ADD_YOUR_TOKEN_HERE>"
# dr.Client(endpoint=endpoint, token=token)
# Option 1: Use environment variables (recommended)
# The client will automatically use DATAROBOT_ENDPOINT and DATAROBOT_API_TOKEN
dr.Client()

# Option 2: Explicitly provide endpoint and token
# endpoint = "https://app.datarobot.com/api/v2"
# token = ""
# dr.Client(endpoint=endpoint, token=token)

Create a vector database with DataRobot built-in embeddings¶

This approach uses DataRobot's built-in embedding models to create and deploy a vector database from the uploaded document.

Get the Use Case¶

Vector databases must be associated with a Use Case in DataRobot. This step uses the Use Case that the notebook is running in (if available), or creates a new one if the notebook is not in a Use Case. Set USE_CASE_NAME to define a more specific Use Case name.

In [ ]:

Copied!





# Create the Use Case
USE_CASE_NAME = "VDB Example Use Case"
use_case = dr.UseCase.create(name=USE_CASE_NAME)
print(f"Created Use Case: {use_case.name}")
# Create the Use Case
USE_CASE_NAME = "VDB Example Use Case"
use_case = dr.UseCase.create(name=USE_CASE_NAME)
print(f"Created Use Case: {use_case.name}")

Upload a dataset¶

This section uploads the documents dataset. The dataset should be a ZIP file containing text files (.txt, .md, etc.). Each file is processed and chunked for the vector database.

In [ ]:

Copied!





# Get or create the dataset
DATASET_NAME = "pirate_resumes.zip"
dataset = None

# Search for existing dataset by exact name
try:
    all_datasets = dr.Dataset.list(filter_failed=True)
    for d in all_datasets:
        if d.name == DATASET_NAME:
            dataset = d
            print(f"Found existing dataset: {dataset.name}")
            break
except Exception:
    pass

# Upload if not found
if not dataset:
    dataset_url = "https://s3.amazonaws.com/datarobot_public_datasets/genai/pirate_resumes.zip"
    dataset = dr.Dataset.create_from_url(dataset_url)
    print(f"Uploaded new dataset: {dataset.name}")

# Add dataset to Use Case if not already added
try:
    use_case_datasets = use_case.get_datasets()
    dataset_ids = [d.id for d in use_case_datasets]
    if dataset.id not in dataset_ids:
        use_case.add(dataset)
        print(f"Added dataset to Use Case")
except Exception:
    pass  # Already added or error adding
# Get or create the dataset
DATASET_NAME = "pirate_resumes.zip"
dataset = None

# Search for existing dataset by exact name
try:
    all_datasets = dr.Dataset.list(filter_failed=True)
    for d in all_datasets:
        if d.name == DATASET_NAME:
            dataset = d
            print(f"Found existing dataset: {dataset.name}")
            break
except Exception:
    pass

# Upload if not found
if not dataset:
    dataset_url = "https://s3.amazonaws.com/datarobot_public_datasets/genai/pirate_resumes.zip"
    dataset = dr.Dataset.create_from_url(dataset_url)
    print(f"Uploaded new dataset: {dataset.name}")

# Add dataset to Use Case if not already added
try:
    use_case_datasets = use_case.get_datasets()
    dataset_ids = [d.id for d in use_case_datasets]
    if dataset.id not in dataset_ids:
        use_case.add(dataset)
        print(f"Added dataset to Use Case")
except Exception:
    pass  # Already added or error adding

Configure chunking parameters¶

Configure how your documents will be split into chunks. The chunking parameters determine:

Chunk size: Maximum number of characters per chunk.
Chunk overlap: Percentage of overlap between chunks (helps preserve context).
Chunking method: The algorithm used to split text (recursive, fixed, etc.).
Embedding model: The model used to generate embeddings (optional, defaults to Jina).

In [ ]:

Copied!





chunking_parameters = ChunkingParameters(
    embedding_model=VectorDatabaseEmbeddingModel.JINA_EMBEDDING_T_EN_V1,
    chunking_method=VectorDatabaseChunkingMethod.RECURSIVE,
    chunk_size=256,
    chunk_overlap_percentage=25,
    separators=["\n\n", "\n", " ", ""],
)
chunking_parameters = ChunkingParameters(
    embedding_model=VectorDatabaseEmbeddingModel.JINA_EMBEDDING_T_EN_V1,
    chunking_method=VectorDatabaseChunkingMethod.RECURSIVE,
    chunk_size=256,
    chunk_overlap_percentage=25,
    separators=["\n\n", "\n", " ", ""],
)

Create the vector database¶

Create the vector database using the dataset and chunking parameters. This process:

Splits the documents into chunks.
Generates embeddings for each chunk using the specified embedding model.
Stores the chunks and embeddings in the vector database.

This process typically takes 30-60 seconds depending on the size of the dataset.

In [ ]:

Copied!





vdb = VectorDatabase.create(
    dataset_id=dataset.id,
    chunking_parameters=chunking_parameters,
    use_case=use_case,
    name="My Vector Database"
)
vdb = VectorDatabase.create(
    dataset_id=dataset.id,
    chunking_parameters=chunking_parameters,
    use_case=use_case,
    name="My Vector Database"
)

Check the status of the vector database until it completes successfully.

In [ ]:

Copied!





max_wait_time = 600
check_interval = 5
start_time = time.time()

print("Waiting for vector database creation...")
while time.time() - start_time < max_wait_time:
    vdb = VectorDatabase.get(vdb.id)
    status = vdb.execution_status
    if status == "COMPLETED":
        print(f"Vector database created: {vdb.name}")
        break
    elif status == "FAILED":
        error_msg = getattr(vdb, 'error_message', 'Unknown error')
        raise Exception(f"Vector database creation failed: {error_msg}")
    else:
        # Show progress if available
        percentage = getattr(vdb, 'percentage', None)
        if percentage is not None:
            print(f"  Status: {status} ({percentage}%)")
        else:
            print(f"  Status: {status}...")
    time.sleep(check_interval)
else:
    raise Exception(f"Vector database creation timed out after {max_wait_time} seconds")

assert vdb.execution_status == "COMPLETED", f"Vector database creation failed with status: {vdb.execution_status}"
max_wait_time = 600
check_interval = 5
start_time = time.time()

print("Waiting for vector database creation...")
while time.time() - start_time < max_wait_time:
    vdb = VectorDatabase.get(vdb.id)
    status = vdb.execution_status
    if status == "COMPLETED":
        print(f"Vector database created: {vdb.name}")
        break
    elif status == "FAILED":
        error_msg = getattr(vdb, 'error_message', 'Unknown error')
        raise Exception(f"Vector database creation failed: {error_msg}")
    else:
        # Show progress if available
        percentage = getattr(vdb, 'percentage', None)
        if percentage is not None:
            print(f"  Status: {status} ({percentage}%)")
        else:
            print(f"  Status: {status}...")
    time.sleep(check_interval)
else:
    raise Exception(f"Vector database creation timed out after {max_wait_time} seconds")

assert vdb.execution_status == "COMPLETED", f"Vector database creation failed with status: {vdb.execution_status}"

Deploy the vector database¶

Once the vector database is created, deploy it for production use. There are two main ways to deploy vector databases:

Direct deployment: Deploy the vector database directly to a prediction environment using the Python SDK
Send to Workshop: Register the vector database as a custom model first, then deploy it

Create a prediction environment¶

First, you need a prediction environment. DataRobot Serverless is typically used for vector database deployments. This section creates a new prediction environment if one doesn't already exist. In addition, this section selects the resource bundle required to run the vector database.

In [ ]:

Copied!





PREDICTION_ENVIRONMENT_NAME = "Vector Database Prediction Environment"

# Get or create prediction environment
prediction_environment = None
for env in dr.PredictionEnvironment.list():
    if env.name == PREDICTION_ENVIRONMENT_NAME:
        prediction_environment = env
        break

if prediction_environment is None:
    prediction_environment = dr.PredictionEnvironment.create(
        name=PREDICTION_ENVIRONMENT_NAME,
        platform=PredictionEnvironmentPlatform.DATAROBOT_SERVERLESS,
        supported_model_formats=[
            PredictionEnvironmentModelFormats.DATAROBOT,
            PredictionEnvironmentModelFormats.CUSTOM_MODEL
        ],
    )
    print(f"Created prediction environment: {prediction_environment.name}")
else:
    print(f"Using existing prediction environment: {prediction_environment.name}")

# Select 3XL resource bundle
resource_bundle_id = None
try:
    dr_client = dr.Client()
    bundles_url = f"{dr_client.endpoint}/mlops/compute/bundles/"
    headers = {"Authorization": f"Bearer {dr_client.token}"}
    bundles_response = requests.get(bundles_url, headers=headers, params={"useCases": "customModel"})
    
    if bundles_response.status_code == 200:
        bundles_data = bundles_response.json()
        if bundles_data.get("data"):
            bundles = bundles_data["data"]
            # Look for 3XL bundle
            bundle_3xl = next((b for b in bundles if "3XL" in b.get("name", "").upper()), None)
            if bundle_3xl:
                resource_bundle_id = bundle_3xl["id"]
                print(f"Selected 3XL bundle: {bundle_3xl['name']}")
            else:
                # Fallback to largest available
                sorted_bundles = sorted(bundles, key=lambda b: b.get("memoryBytes", 0), reverse=True)
                if sorted_bundles:
                    resource_bundle_id = sorted_bundles[0]["id"]
                    print(f"Warning: 3XL bundle not found. Using largest available: {sorted_bundles[0]['name']}")
        else:
            print("Using memory settings (no resource bundles available)")
    else:
        print("Using memory settings (resource bundles not enabled)")
except (ImportError, KeyError) as e:
    print(f"Using memory settings (error checking bundles): {e}")
except requests.RequestException as e:
    print(f"Using memory settings (network error checking bundles): {e}")
PREDICTION_ENVIRONMENT_NAME = "Vector Database Prediction Environment"

# Get or create prediction environment
prediction_environment = None
for env in dr.PredictionEnvironment.list():
    if env.name == PREDICTION_ENVIRONMENT_NAME:
        prediction_environment = env
        break

if prediction_environment is None:
    prediction_environment = dr.PredictionEnvironment.create(
        name=PREDICTION_ENVIRONMENT_NAME,
        platform=PredictionEnvironmentPlatform.DATAROBOT_SERVERLESS,
        supported_model_formats=[
            PredictionEnvironmentModelFormats.DATAROBOT,
            PredictionEnvironmentModelFormats.CUSTOM_MODEL
        ],
    )
    print(f"Created prediction environment: {prediction_environment.name}")
else:
    print(f"Using existing prediction environment: {prediction_environment.name}")

# Select 3XL resource bundle
resource_bundle_id = None
try:
    dr_client = dr.Client()
    bundles_url = f"{dr_client.endpoint}/mlops/compute/bundles/"
    headers = {"Authorization": f"Bearer {dr_client.token}"}
    bundles_response = requests.get(bundles_url, headers=headers, params={"useCases": "customModel"})
    
    if bundles_response.status_code == 200:
        bundles_data = bundles_response.json()
        if bundles_data.get("data"):
            bundles = bundles_data["data"]
            # Look for 3XL bundle
            bundle_3xl = next((b for b in bundles if "3XL" in b.get("name", "").upper()), None)
            if bundle_3xl:
                resource_bundle_id = bundle_3xl["id"]
                print(f"Selected 3XL bundle: {bundle_3xl['name']}")
            else:
                # Fallback to largest available
                sorted_bundles = sorted(bundles, key=lambda b: b.get("memoryBytes", 0), reverse=True)
                if sorted_bundles:
                    resource_bundle_id = sorted_bundles[0]["id"]
                    print(f"Warning: 3XL bundle not found. Using largest available: {sorted_bundles[0]['name']}")
        else:
            print("Using memory settings (no resource bundles available)")
    else:
        print("Using memory settings (resource bundles not enabled)")
except (ImportError, KeyError) as e:
    print(f"Using memory settings (error checking bundles): {e}")
except requests.RequestException as e:
    print(f"Using memory settings (network error checking bundles): {e}")

Send vector database to workshop¶

Before deploying, send the vector database to the custom model workshop. This creates a custom model version that can be registered and deployed.

The code uses the resource configuration determined when the prediction environment was created or checked (resource bundles if available, otherwise memory settings).

In [ ]:

Copied!





assert vdb.execution_status == "COMPLETED", f"Vector database must be completed. Current status: {vdb.execution_status}"

# Send to workshop with 3XL bundle or memory settings
if resource_bundle_id:
    custom_model_version = vdb.send_to_custom_model_workshop(
        resource_bundle_id=resource_bundle_id,
        replicas=1,
        network_egress_policy=dr.NETWORK_EGRESS_POLICY.PUBLIC,
    )
else:
    custom_model_version = vdb.send_to_custom_model_workshop(
        maximum_memory=4096*1024*1024,
        replicas=1,
        network_egress_policy=dr.NETWORK_EGRESS_POLICY.PUBLIC,
    )

print(f"Custom model version created: {custom_model_version}")
assert vdb.execution_status == "COMPLETED", f"Vector database must be completed. Current status: {vdb.execution_status}"

# Send to workshop with 3XL bundle or memory settings
if resource_bundle_id:
    custom_model_version = vdb.send_to_custom_model_workshop(
        resource_bundle_id=resource_bundle_id,
        replicas=1,
        network_egress_policy=dr.NETWORK_EGRESS_POLICY.PUBLIC,
    )
else:
    custom_model_version = vdb.send_to_custom_model_workshop(
        maximum_memory=4096*1024*1024,
        replicas=1,
        network_egress_policy=dr.NETWORK_EGRESS_POLICY.PUBLIC,
    )

print(f"Custom model version created: {custom_model_version}")

Register the model¶

Next, register the custom model version. If a registered model with the same name already exists, this step adds a new version to the existing model instead of creating a duplicate.

In [ ]:

Copied!





REGISTERED_MODEL_NAME = f"Vector Database - {vdb.name}"

# Register model (adds new version if model already exists)
existing_models = [m for m in dr.RegisteredModel.list() if m.name == REGISTERED_MODEL_NAME]

if existing_models:
    registered_model_version = dr.RegisteredModelVersion.create_for_custom_model_version(
        custom_model_version_id=custom_model_version.id,
        registered_model_id=existing_models[0].id,
    )
    print(f"Added new version to existing registered model: {REGISTERED_MODEL_NAME}")
else:
    registered_model_version = dr.RegisteredModelVersion.create_for_custom_model_version(
        custom_model_version_id=custom_model_version.id,
        registered_model_name=REGISTERED_MODEL_NAME,
    )
    print(f"Created new registered model: {REGISTERED_MODEL_NAME}")
REGISTERED_MODEL_NAME = f"Vector Database - {vdb.name}"

# Register model (adds new version if model already exists)
existing_models = [m for m in dr.RegisteredModel.list() if m.name == REGISTERED_MODEL_NAME]

if existing_models:
    registered_model_version = dr.RegisteredModelVersion.create_for_custom_model_version(
        custom_model_version_id=custom_model_version.id,
        registered_model_id=existing_models[0].id,
    )
    print(f"Added new version to existing registered model: {REGISTERED_MODEL_NAME}")
else:
    registered_model_version = dr.RegisteredModelVersion.create_for_custom_model_version(
        custom_model_version_id=custom_model_version.id,
        registered_model_name=REGISTERED_MODEL_NAME,
    )
    print(f"Created new registered model: {REGISTERED_MODEL_NAME}")

Wait for model build to complete¶

Wait for the registered model version to finish building before deploying.

In [ ]:

Copied!





registered_model = dr.RegisteredModel.get(registered_model_version.registered_model_id)
max_wait_time = 600
check_interval = 10
start_time = time.time()

print("Waiting for model build to complete...")
while time.time() - start_time < max_wait_time:
    version = registered_model.get_version(registered_model_version.id)
    build_status = getattr(version, 'build_status', None) or getattr(version, 'buildStatus', None)
    
    if build_status in ('READY', 'complete', 'COMPLETE'):
        print(f"Model build completed (status: {build_status})")
        break
    elif build_status in ('FAILED', 'ERROR', 'error'):
        raise Exception(f"Model build failed. Status: {build_status}")
    else:
        print(f"  Build status: {build_status}...")
    time.sleep(check_interval)
else:
    version = registered_model.get_version(registered_model_version.id)
    build_status = getattr(version, 'build_status', None) or getattr(version, 'buildStatus', None)
    raise Exception(f"Model build timed out. Current status: {build_status}")

# Verify ready status
version = registered_model.get_version(registered_model_version.id)
final_status = getattr(version, 'build_status', None) or getattr(version, 'buildStatus', None)
if final_status not in ('READY', 'complete', 'COMPLETE'):
    raise Exception(f"Model not ready for deployment. Status: {final_status}")
registered_model = dr.RegisteredModel.get(registered_model_version.registered_model_id)
max_wait_time = 600
check_interval = 10
start_time = time.time()

print("Waiting for model build to complete...")
while time.time() - start_time < max_wait_time:
    version = registered_model.get_version(registered_model_version.id)
    build_status = getattr(version, 'build_status', None) or getattr(version, 'buildStatus', None)
    
    if build_status in ('READY', 'complete', 'COMPLETE'):
        print(f"Model build completed (status: {build_status})")
        break
    elif build_status in ('FAILED', 'ERROR', 'error'):
        raise Exception(f"Model build failed. Status: {build_status}")
    else:
        print(f"  Build status: {build_status}...")
    time.sleep(check_interval)
else:
    version = registered_model.get_version(registered_model_version.id)
    build_status = getattr(version, 'build_status', None) or getattr(version, 'buildStatus', None)
    raise Exception(f"Model build timed out. Current status: {build_status}")

# Verify ready status
version = registered_model.get_version(registered_model_version.id)
final_status = getattr(version, 'build_status', None) or getattr(version, 'buildStatus', None)
if final_status not in ('READY', 'complete', 'COMPLETE'):
    raise Exception(f"Model not ready for deployment. Status: {final_status}")

Deploy the registered model¶

Deploy the registered model version to the prediction environment created earlier. The model must be in READY status before deployment process can resolve successfully.

In [ ]:

Copied!





deployment = dr.Deployment.create_from_registered_model_version(
    registered_model_version.id,
    label=f"Vector Database Deployment - {vdb.name}",
    description="Vector database deployment for RAG applications",
    prediction_environment_id=prediction_environment.id,
    max_wait=600,
)

print(f"Deployment created: {deployment.id}")

deployment = dr.Deployment.create_from_registered_model_version(
    registered_model_version.id,
    label=f"Vector Database Deployment - {vdb.name}",
    description="Vector database deployment for RAG applications",
    prediction_environment_id=prediction_environment.id,
    max_wait=600,
)

print(f"Deployment created: {deployment.id}")

Use vector databases in LLM Playgrounds¶

Vector databases created through the Python SDK are automatically available in LLM Playgrounds for use in RAG workflows. See the genai-e2e.ipynb notebook for examples of using vector databases with LLMs.

For detailed deployment instructions and UI options, see the Register and deploy vector databases documentation.

List and manage vector databases¶

Using the command below list all vector databases associated with a Use Case and manage them programmatically.

In [ ]:

Copied!

# List all vector databases in a Use Case
vdbs = VectorDatabase.list(use_case=use_case)
print(f"Found {len(vdbs)} vector database(s) in Use Case '{use_case.name}'")
# List all vector databases in a Use Case
vdbs = VectorDatabase.list(use_case=use_case)
print(f"Found {len(vdbs)} vector database(s) in Use Case '{use_case.name}'")

Next steps:¶

Use your vector database in an LLM Playground.
Learn about creating vector databases with custom embedding models (BYO embeddings).
Explore external vector databases like ChromaDB.