# GenAI with governance

> GenAI with governance - Use International Space Station research papers to compare multiple
> retrieval-augmented generation (RAG) pipelines with evaluation metrics and governance.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-05-06T18:17:09.948960+00:00` (UTC).

## Primary page

- [GenAI with governance](https://docs.datarobot.com/en/docs/get-started/how-to/genai-space.html): Full documentation for this topic (HTML).

## Sections on this page

- [Assets for download](https://docs.datarobot.com/en/docs/get-started/how-to/genai-space.html#assets-for-download): In-page section heading.
- [1. Create a Use Case](https://docs.datarobot.com/en/docs/get-started/how-to/genai-space.html#1-create-a-use-case): In-page section heading.
- [2. Upload data](https://docs.datarobot.com/en/docs/get-started/how-to/genai-space.html#2-upload-data): In-page section heading.
- [3. Create a vector database](https://docs.datarobot.com/en/docs/get-started/how-to/genai-space.html#3-create-a-vector-database): In-page section heading.
- [4. Add a playground](https://docs.datarobot.com/en/docs/get-started/how-to/genai-space.html#4-add-a-playground): In-page section heading.
- [5. Build an LLM blueprint](https://docs.datarobot.com/en/docs/get-started/how-to/genai-space.html#5-build-an-llm-blueprint): In-page section heading.
- [6. Test the LLM blueprint](https://docs.datarobot.com/en/docs/get-started/how-to/genai-space.html#6-test-the-llm-blueprint): In-page section heading.
- [7. Create comparison blueprints](https://docs.datarobot.com/en/docs/get-started/how-to/genai-space.html#7-create-comparison-blueprints): In-page section heading.
- [8. Compare blueprints](https://docs.datarobot.com/en/docs/get-started/how-to/genai-space.html#8-compare-blueprints): In-page section heading.
- [9. Evaluate responses](https://docs.datarobot.com/en/docs/get-started/how-to/genai-space.html#9-evaluate-responses): In-page section heading.
- [10. Add an evaluation dataset](https://docs.datarobot.com/en/docs/get-started/how-to/genai-space.html#10-add-an-evaluation-dataset): In-page section heading.
- [11. Configure aggregated metrics](https://docs.datarobot.com/en/docs/get-started/how-to/genai-space.html#11-configure-aggregated-metrics): In-page section heading.
- [12. Interpret aggregated metrics](https://docs.datarobot.com/en/docs/get-started/how-to/genai-space.html#12-interpret-aggregated-metrics): In-page section heading.
- [13. Tracing](https://docs.datarobot.com/en/docs/get-started/how-to/genai-space.html#13-tracing): In-page section heading.
- [Next steps](https://docs.datarobot.com/en/docs/get-started/how-to/genai-space.html#next-steps): In-page section heading.

## Related documentation

- [Get started](https://docs.datarobot.com/en/docs/get-started/index.html): Linked from this page.
- [How-tos](https://docs.datarobot.com/en/docs/get-started/how-to/index.html): Linked from this page.
- [GenAI section](https://docs.datarobot.com/en/docs/agentic-ai/index.html): Linked from this page.
- [Working with Use Cases](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/usecases/index.html): Linked from this page.
- [Upload local files](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/dataprep/add-data/local-file.html): Linked from this page.
- [Dataset types](https://docs.datarobot.com/en/docs/agentic-ai/vector-database/vector-dbs-data.html#add-data-sources): Linked from this page.
- [Embeddings](https://docs.datarobot.com/en/docs/agentic-ai/vector-database/vector-dbs.html#embeddings): Linked from this page.
- [Playground overview](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/playground-overview.html): Linked from this page.
- [Create an LLM blueprint](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/build-llm-blueprints.html): Linked from this page.
- [Chatting with a single LLM blueprint](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/rag-chatting.html#single-llm-blueprint-chat): Linked from this page.
- [Compare LLMs](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/compare-llm.html): Linked from this page.
- [Evaluation datasets](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/playground-eval-metrics.html#add-evaluation-datasets): Linked from this page.
- [Deploy an LLM](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/deploy-llm.html): Linked from this page.
- [GenAI experiment with code](https://docs.datarobot.com/en/docs/agentic-ai/genai-code/genai-e2e.html): Linked from this page.

## Documentation content

This generative AI use case compares multiple retrieval-augmented generation (RAG) pipelines. When completed, you'll have multiple end-to-end pipelines with built-in evaluation, assessment, and logging, providing governance and guardrails. Watch a video version on [YouTube](https://www.youtube.com/watch?v=WiEC5liBBEo).

> [!TIP] Learn more
> To learn more about generative AI at DataRobot, visit the [GenAI section](https://docs.datarobot.com/en/docs/agentic-ai/index.html) of the documentation. There you can find an overview and information about vector databases, playgrounds, and metrics, using both the UI and code.

## Assets for download

To build this experiment as you follow along, first download the file `DataRobot+GenAI+Space+Research.zip` and unzip the archive. Inside you will find a TXT file, a CSV file, and another ZIP file, `Space_Station_Research.zip`. Do not unzip this inner ZIP archive.

[Download files](https://datarobot-doc-assets.s3.us-east-1.amazonaws.com/DataRobot%2BGenAI%2BSpace%2BResearch.zip)

## 1. Create a Use Case

From the Workbench directory, click Create Use Case in the upper right: and name it `Space research`.

Read more: [Working with Use Cases](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/usecases/index.html)

## 2. Upload data

Click Add data and then Upload on the resulting screen. From the assets you downloaded, upload the file named `Space_Station_Research.zip`. Do not unzip it.This is not the ZIP file you downloaded, but a ZIP within the original downloaded archive.DataRobot will begin registering the dataset.

You can use this time to look at the documents you downloaded that are inside the ZIP file. Locally, unzip `Space_Station_Research.zip` and expand `Space_Station_Annual_Highlights`, which contains PDFs sharing highlights from the International Space Station's research programs from the last few years.

Read more: [Upload local files](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/dataprep/add-data/local-file.html)

## 3. Create a vector database

After you upload data, you can create a vector database to enrich prompts with relevant context before they are sent to the LLM. To create a vector database:

1. There are two paths to creating a vector database‐both open the same creation page. You can use theAdddropdown on the right and clickVector database > Create vector databaseor, from theVector databasetab, clickCreate vector database.
2. Set the configuration using the following settings:

| Field | Setting | Notes |
| --- | --- | --- |
| Name | Jina 256/20 | This name was selected to reflect the settings, but could be anything. |
| Data source | Space_Station_Research.zip | All valid datasets uploaded to the Use Case will be available in the dropdown. |
| Embedding model | jinaai/jina-embedding-t-en-v1 | Choose the recommended embedding model, Jina, for this exercise. |

```
![](../../images/gai-walk-5.png)
```

1. Text chunking is the process of splitting text documents into smaller text chunks that are then used to generate embeddings. You can use separator rules to divide content, set chunk overlap, and set the maximum number of tokens in each chunk. For this walkthrough, only change the chunk overlap percentage; leaveMax tokens per chunkon the recommended value of 256. Move the chunk overlap slider to 20%:
2. ClickCreate Vector Database; you are returned to the Use Case directory. While the vector database is building, add a second vector database for comparison purposes. This time, useintfloat/e5-base-v2as the embedding model. To compare it against theJinamodel, make theChunk overlapandMax tokens per chunksettings the same as those you set in step 2. That is, chunk overlap of 20% and max tokens of 256.

Create any number of vector databases by iterating through this process. The best settings will depend on the type of text that you're working with and the objective of your use case.

Read more:

- Dataset types
- Embeddings
- Chunking settings

## 4. Add a playground

The playground is where you create and compare LLM blueprints, configure metrics, and compare LLM blueprint responses before deployment. Create a playground using one of two methods.

Read more: [Playground overview](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/playground-overview.html)

## 5. Build an LLM blueprint

Once in the playground, create an LLM blueprint:

1. In the Playground, on the LLM blueprints panel, clickCreate LLM blueprint:
2. In theConfigurationpanel,LLMtab, set the following:

| Field | Setting | Notes |
| --- | --- | --- |
| LLM | Azure OpenAI GPT-3.5 Turbo | Alternatively, you can add a deployed LLM to the playground, which, when validated, is added to the Use Case and available to all associated playgrounds. |
| Max completion tokens | 1024 (default) | The maximum number of tokens allowed in the completion. |
| Temperature | .1 | Controls the randomness of model output. Change this to focus on truthfulness for scientific research papers. |
| Top P | 1 (default) | Sets a threshold that controls the selection of words included in the response based on a cumulative probability cutoff for token selection |

```
![](../../images/gai-walk-11.png)
```

1. From theVector databasetab, choose the first vector database built,Jina 256/20and use the default configuration.
2. From thePromptingtab, chooseNo context. Context states control whether chat history is sent with the prompt to include relevant context for responses.No contextsends each prompt as independent input, without history from the chat. Then, enter the following prompt and then save the configuration: Your job is to help scientists write compelling pitches to have their talks accepted by conference organizers. You'll be given a proposed title for a presentation. Use details from the documents provided to write a one paragraph persuasive pitch for the presentation.

Read more:

- Playground overview
- Create an LLM blueprint
- Add a deployed LLM
- LLM settings
- Prompting strategies

## 6. Test the LLM blueprint

Once saved, test the configuration with prompting (also known as "chatting"). Ideas are provided in the TXT file you downloaded. For example try these two prompts asking for a conference pitch in the Send a prompt dialog:

- Blood flow and circulation in space.
- Microgravity is weird.

Next, click the edit icon next to the blueprint name to make it more descriptive, for example `Azure GPT 3.5 Turbo + Jina`, then click confirm.

Read more: [Chatting with a single LLM blueprint](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/rag-chatting.html#single-llm-blueprint-chat)

## 7. Create comparison blueprints

To compare configuration settings, you must first create additional blueprints. To do this you can:

- Follow the steps above to create a new LLM blueprint.
- Make a copy of the existing blueprint and change one or more settings.

You can do either of these actions from both the blueprint configuration area or the LLM blueprints panel. Because the intent is to compare blueprints, the following process copies the blueprint on the LLM blueprints panel of the Playground tile [https://docs.datarobot.com/en/docs/images/icon-playground-sparkle.png](https://docs.datarobot.com/en/docs/images/icon-playground-sparkle.png).

> [!NOTE] Note
> You can [navigate through the playground](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/playground-overview.html#navigate-the-playground) using the icons in the far left.

1. From the named LLM blueprint, click theActions menuand selectCopy to new LLM blueprint. All settings from the first blueprint are carried over.
2. Change the vector database (1), save the configuration (2), and name the new blueprintAzure GPT 3.5 Turbo + E5(3).
3. Return to theLLM blueprintspanel to create a third blueprint. From the new LLM blueprint,Azure GPT 3.5 + E5, clickCopy to new LLM blueprintand this time change the LLM. For this walkthrough, chooseAmazon Titanand set theTemperaturevalue to0.1. Name the blueprintAmazon Titan + E5.

Read more: [Copy LLM blueprints](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/build-llm-blueprints.html#copy-llm-blueprint)

## 8. Compare blueprints

You can compare chats (responses) for up to three LLM blueprints from a single screen. The LLM blueprints tab lists all blueprints available for comparison—with filtering provided to simplify finding what you are interested in—as well as provides quick access to the chat history.

To start the comparison, select all three blueprints by checking the box to the left of the name. Notice that a summary is available for each. Enter a new topic for exploration in the Send a prompt field. For example: `Monitoring astronaut health status.`

Try a different prompt, for example, `Applications of ISS science results on earth`. The response that you prefer is subjective and depends on the use case, but there are some quantitative metrics to help you evaluate.

Read more: [Compare LLMs](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/compare-llm.html)

## 9. Evaluate responses

One method of evaluating a response is to look at the basic information DataRobot returns with each prompt, summarized below the response. Expand the information panel for the LLM blueprint that used the Jina vector database; you can see that the response took seven seconds, had 173 response tokens, and scored 56.86% on the ROUGE-1 confidence metric. The ROUGE-1 metric represents how similar this LLM answer is to the citations provided to aid in its generation.

To better understand the results, look at the citations. You can see a list of the chunks the generated answer from the LLM is based on:

Scroll and read a few of the citations. This is the stage where you can see the impact of the chunk size you selected when you created the vector database. You may get better results with longer or shorter chunks, and could test that by creating additional vector databases.

Read more: [Citations](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/rag-chatting.html#citations)

## 10. Add an evaluation dataset

The metrics described in the step above correspond to one LLM blueprint response, but only so much can be learned from evaluating a single prompt/response. To evaluate which LLM blueprint is the best overall, you will want aggregated metrics. Aggregation combines metrics across many prompts and/or responses, which helps to evaluate a blueprint at a high level and provides a more comprehensive approach to evaluation.

First, in this step, you will add an evaluation dataset, which is required for aggregation. You will configure aggregation in [step 11](https://docs.datarobot.com/en/docs/get-started/how-to/genai-space.html#11-configure-aggregated-metrics).

1. Click theLLM evaluationiconin the upper left navigation.
2. From theLLM evaluationpage, click theEvaluation datasetstab and thenAdd evaluation dataset.
3. From theAdd evaluation datasetpanel, click the dataset namedSpace_research_evaluation_prompts.csv, which contains some additional conference titles to be used as a standard reference set.
4. Next, define thePrompt column nameand theResponse (target) column name, as shown below. Then, clickAdd evaluation dataset. FieldSettingPrompt column namequestionResponse (target) column nameanswer DataRobot returns to theEvaluation dataset metricsconfiguration page.

Read more: [Evaluation datasets](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/playground-eval-metrics.html#add-evaluation-datasets)

## 11. Configure aggregated metrics

After you add an evaluation dataset, to configure aggregation:

1. Click thePlaygroundiconto return to theLLM blueprint comparison, then, in the bottom left under the responses, clickConfigure aggregation.
2. In the configuration section, define aChat nameand select theEvaluation datasetadded in the previous section.
3. TheGenerate aggregated metricspage opens. Set theLatencyandROUGE-1metrics toAverage. Then, clickGenerate metrics.

A notification in the lower right confirms that the aggregation job is queued. It can take some time for the aggregation request to process, but the metrics will appear as they complete.

Read more: [Aggregated metrics](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/playground-eval-metrics.html#aggregated-metrics)

## 12. Interpret aggregated metrics

When the aggregation job completes, on the LLM blueprint comparison page, open the Aggregated metrics tab. Note that these aggregate metrics are based on the rows in the evaluation dataset.

To see the row-level details that contributed to these values, click an LLM blueprint. Notice on the left panel of the Chats tab, there is an entry named Aggregated metric chat (or a different chat name, as defined in the previous section), which contains all the responses to the prompts in the evaluation dataset.

Scroll through the results to view the conference talks. You can provide feedback with the "thumbs" emojis. For example, for the question (prompt column name) "How are Lichen Liking Space?", give the response some positive feedback (thumbs up):

## 13. Tracing

Tracing the execution of LLM blueprints is a powerful tool for understanding how most parts of the GenAI stack work. The tracing tab provides a log of all components and prompting activities used in generating LLM responses in the playground.

Click the Tracing icon in the upper left navigation to access a log of all the components used in the LLM response generation. The table traces exactly which LLM parameters, which vector database, which system prompt, and which user prompt resulted in a particular generated response.

Scroll the page to the far right to see the user feedback. You can use this information for LLM fine-tuning.

You can also export the log to the DataRobot AI Catalog. From there, you can work with it in other ways, such as writing it to a database table or downloading it.

Read more: [Tracing](https://docs.datarobot.com/en/docs/agentic-ai/playground-tools/playground-eval-metrics.html#tracing)

## Next steps

After completing this walkthough, some suggested next steps are:

- Deploy an LLM from the playground.
- Create an end-to-end GenAI experiment with code
