Versioning vector databases¶
The ability to version vector databases—creating child, related entities based on a single parent—brings a host of benefits to GenAI solution building. Versioning uses metadata in the lineage process to help assess and compare results with previous versions. With versioning you can:
-
Update the data in your vector database, ensuring the most up-to-date data is available to ground LLM responses.
-
Create new versions, creating a full vector database lineage, but also select previous versions. This allows you to "update" older versions that are used by downstream assets and to roll back to previous versions, if needed.
-
Apply "tried and true" chunking and/or embedding parameters from existing vector databases to new data.
-
Use the dataset's metadata during retrieval, allowing you to more effectively search for chunks in dataset.
Action | Description |
---|---|
Create playground using this version | Opens a new playground with the vector database loaded into the LLM configuration. |
Create new version from this version | Creates a new version of the vector database that is based on the version that is currently selected. |
Export this version to the Data Registry | Exports the latest vector database version to Data Registry. It can then be used in different Use Case playgrounds. |
Delete vector database | Deletes the parent vector database and all versions. Because the vector databases used by deployments are snapshots, deleting a vector database in a Use Case does not affect the deployments using that vector database. The deployment uses an independent snapshot of the vector database.. |
Versions related to a single parent vector database are displayed in a collapsible right panel. Click any version to update the details and related items to reflect information for the selected version.
Create a version¶
You can create a vector database version from any parent or child version on which you are an owner—you do not need to have been the vector database creator. (You do have to be the creator to delete a vector database or version, as described in the considerations.)
There are two places on the vector database details screen where you can create a new version:
Method | Use to | |
---|---|---|
1 | Vector database selector | Create a new version based on the parent version. |
2 | Vector database actions | Create a new version from the selected version. |
In either case, a version creation window opens with fields dependent on your update selection method—adding or replacing data.
Fields | Description |
---|---|
Update vector database version | Select the data and chunking settings for the new child version. |
Current vector database configuration | When adding data. Review the configuration of the vector database version from which the new version is created. |
Test chunking | When replacing data. Set whether to use chunking, and if so, the chunking configuration. |
Related items | When related items are connected to the source of the new version, manage which assets are updated to use the newly created version. |
Choose how to update the vector database. You can either add data to the existing source data or replace the data source with entirely new data.
Select a data source¶
Whether adding or appending data, you select the data source using the Select data dropdown. You can select any data that is associated with the Use Case. If the data you need is not already associated with the Use Case, use the Add data option to open the Data Registry and add a new registered dataset. Any data you add from the Data Registry is handled according to your selection and added to the Use Case, where it can be used in other vector databases.
Update vector database version: Add¶
Use the fields in this section to select the changes you want made for the new version. The new version is named, by default, VX
. This name increments by one from the last version created in this vector database lineage. That does not mean that versions can only be built from the immediately previous version. For example, if you have Parent-vdb
, V1
, V2
, and V3
, and you create a new version from V2
, that version will be named V4
, regardless of its basis.
If you click the Add data radio button, whichever data source you select is appended to the existing data in the vector database.
Current vector database configuration¶
When adding new data, the middle section of the window reports the configuration of the vector database this version is built from. This is the same information provided in the Details section on the vector database listing. Note that when selecting the Add data method, you cannot change the chunking configuration. Chunking of the new data uses the same chunking rules as those applied to the data you are appending to. The output reports:
- Basic vector database metadata: ID, creator and creation date, data source name and size.
- Chunking configuration settings: Embedding column and chunking method and settings.
- Metadata columns: Names of columns from the data source, which can later be used for metadata filtering.
Update vector database version: Replace¶
Choose Replace data and change chunking to replace the data source completely and, optionally, modify the vector database configuration. You are prompted to select the replacement data.
When replacing the data source entirely, both the embedding model and the chunking configuration can be changed. Fundamentally, this method rebuilds the vector database but provides you a starting point from an existing version. You may want to do this, for example, to test prompting strategies or to maintain deployed assets.
After selecting the data source, configure the chunking strategy.
Related items¶
Note
The Related items information is only visible if the configuration has associated deployments, custom models, or registered models.
The help text under Related items indicates the number of assets related to source vector database that you are creating a new version from ("There are 3 assets connected to this vector database.") Use the radio buttons to set the update method for all assets connected to the source of the new version, either manually or automatically.
When you select to update manually, after saving the new version you are taken to the vector database details page. From there, navigate to the LLM blueprint in the playground and manually export it to the model workshop. Then, register the custom model that uses the new vector database and do a model replacement for the deployments that should use the new newly created model package.
When using the automatic update option, you are prompted to choose exactly which assets you would like DataRobot to update with the new vector database version.
Select either:
- Update all related LLM blueprints to swap the new version into each related LLM blueprint configuration.
- Update all related deployment assets to swap the new version into all related LLM blueprints, deployments, and custom model and registered model versions that are used by the deployment.
Comparing versions¶
Use the Details section to compare vector database versions and see the results of changes implemented.
Check back soon
The documentation, like the application, is "continuous deployment." This section will soon be expanded to contain more descriptions, examples, and images.
Create a playground¶
Use either the button in the Related items section or the Vector database actions dropdown to create a playground from the selected version of a vector database.
A new playground opens, ready for configuration. The playground opens to the Vector database tab, with the version and chunking information preloaded.
Note
Although this page was reached from within the vector database details page, you are creating a brand new playground. There is no LLM selected, so be sure to set the LLM blueprint in the LLM tab and also consider your prompting strategy.
From the Vector database tab you can modify settings and even create a new vector database. See details on creating vector databases in a playground.
Once the vector database is configured and saved you can send text queries.