Skip to content

GenAI workflow overview

This section provides a generalized discussion of the generative LLM building workflow, which can include:

Tip

For a hands-on experience, try the GenAI walkthrough.

See the full documentation for information on using your own data and LLMs, working with code instead of the UI, and working with NVIDIA NIM.

Get started

It all begins by [creating a Use Case]{wb-build-usecase}{ target=_blank } and adding a playground. A playground is a dedicated LLM-focused experimentation environment within Workbench, where you can build, review, compare, evaluate, and deploy.

Build a vector database

Once your playground is set up, optionally add a vector database. The role of the vector database is to enrich the prompt with relevant context before it is sent to the LLM. When creating a vector database, you set a basic configuration and text chunking.

Vector databases can be versioned to make sure the most up-to-date data is available to ground LLM responses.

Build LLM blueprints

An LLM blueprint represents the full context for what is needed to generate a response from an LLM; the resulting output is what can then be compared within the playground.

When you click to create an LLM blueprint, the playground opens with a variety of blueprint-related options. The first step is to configure the LLM blueprint.

In the configuration panel, optionally add a vector database and set the chunking strategy. The new LLM blueprint is listed on the left; add several blueprints to take advantage of DataRobot's LLM blueprint comparison capabilities.

Chat and compare LLM blueprints

Once the LLM blueprint configuration is saved, try sending it prompts (chatting) to determine whether further refinements are needed before considering your LLM blueprint for deployment.

After chatting:

  • View citation metrics to see the top reference document chunks retrieved from the vector database.

  • Open the citations window as a safety check to validate LLM responses.

  • Use the response feedback "thumbs" to rate the prompt answer. Integration of human feedback improves model performance and exported feedback can be used, for example, to train a predictive model.

Then, use the comparison tool to test different LLM blueprints using the same prompt.

See the best practices for prompt engineering when chatting and doing comparisons.

Use LLM evaluation tools

Using metrics and compliance tests, DataRobot monitors how models are used in production, intervening and blocking bad outputs.

Add metrics before or after configuring LLM blueprints:

Add evaluation datasets, or generate a synthetic dataset from within DataRobot, to create a systematic assessment of how well the model performs for its intended tasks.

Combine evaluation metrics and an evaluation dataset to automate the detection of compliance issues through test prompt scenarios. Use DataRobot-supplied evaluations or create your own.

Deploy an LLM

Once you are satisfied with the LLM blueprint, you can send it to the Registry's workshop from the playground.

The Registry workshop is where you test the LLM custom model and ultimately deploy it to Console, a centralized hub for monitoring and model management.

What's next?