Build LLM blueprints¶
LLM blueprints represent the full context for what is needed to generate a response from an LLM, the resulting output is what can then be compared within the playground.
To create an LLM blueprint, select that action from either the Comparison panel or the playground welcome screen.
Clicking the create button brings you to the configuration and chatting tools:
Element | Description | |
---|---|---|
1 | Configuration panel | Provides access to the configuration selections available from creating an LLM blueprint. |
2 | LLM blueprint card summary | Displays a summary of the LLM blueprint configuration, metrics, and timestamp. |
3 | Chat history | Provides access to a record of prompts sent to this LLM blueprint, as well as an option to start a new chat. |
4 | Prompt entry | Accepts prompts to begin chatting with the LLM blueprint; the configuration must be saved before the entry is activated. |
You can also create an LLM blueprint by copying an existing blueprint.
Set the configuration¶
The configuration panel is where you define the LLM blueprint. From here:
LLM selection and settings¶
DataRobot offers a variety of preloaded LLMs, with availability dependent on your cluster and account type.
Alternatively, you can add a deployed LLM to the playground, which, when validated, is added to the Use Case and available to all associated playgrounds. In either case, selecting a base LLM exposes additional configuration options:
Setting | Description |
---|---|
Max completion tokens | The maximum number of tokens allowed in the completion. The combined count of this value and prompt tokens must be below the model’s maximum context size, where prompt token count is comprised of system prompt, user prompt, recent chat history, and vector database citations. |
Temperature | The temperature controls the randomness of model output. Enter a value (range is LLM-dependent), where higher values return more diverse output and lower values return more deterministic results. A value of 0 may return repetitive results. Temperature is an alternative to Top P for controlling the token selection in the output (see the example below). |
Top P | Token selection probability cutoff (Top P) sets a threshold that controls the selection of words included in the response based on a cumulative probability cutoff for token selection. For example, 0.2 considers only the top 20% probability mass. Higher numbers return more diverse options for outputs. Top P is an alternative to Temperature for controlling the token selection in the output (see the example below). |
Temperature or Top P?
Consider prompting: “To make the perfect ice cream sundae, top 2 scoops of vanilla ice cream with… “. The desired responses for a suggested next word might be hot fudge, pineapple sauce, and bacon. To increase the probability of what is returned:
- For bacon, set Temperature to the maximum value and leave top P at the default. Setting Top P with a high Temperature, increases the probability of fudge and pineapple and reduces the probability of bacon.
- For hot fudge, set Temperature to 0.
Each base LLM has default configuration settings. As a result, the only required selection before starting to chat is to choose the LLM.
Add a deployed LLM¶
To add a custom LLM deployed in DataRobot, click Create LLM blueprint to add a new blueprint to the playground. Then, from the playground's blueprint Configuration panel, in the LLM drop-down, click Add deployed LLM:
In the Add deployed LLM dialog box, enter a deployment Name, select the DataRobot deployment associated with the LLM in the Deployment name drop-down, provide the Prompt column name and Response column name defined when you created the custom LLM in the model workshop (for example, promptText
and responseText
), then click Validate and add:
After you add a custom LLM and validation is successful, back in the blueprint's Configuration panel, in the LLM drop-down, click Deployed LLM, and then select the Validation ID of the custom model you added:
Finally, you can configure the Vector database and Prompting settings, and click Save configuration to add the blueprint to the playground.
Add a vector database¶
From the Vector database tab, you can optionally select a vector database. The selection identifies a database comprised of a collection of chunks of unstructured text and corresponding text embeddings for each chunk, indexed for easy retrieval. Vector databases are not required for prompting but are used for providing relevant data to the LLM to generate the response. Add a vector database to a playground to experiment with metrics and test responses.
The following table describes the fields of the Vector database tab:
The dropdown
Field | Description |
---|---|
Vector database | Lists all vector databases available in the Use Case (and therefore accessible for use by all of that Use Case's playgrounds). If you select the Add vector database option, the new vector database you add will become available to other LLM blueprints although you must change the LLM blueprint configuration to apply them. |
Vector database version | Select the version of the vector database that the LLM will use. The field is prepopulated with the version you were viewing when you created the playground. Click Vector database version to leave the playground and open the vector database details page. |
Information | Reports configuration information for the selected version. |
Retriever | Sets the method, neighbor chunk inclusion, and retrieval limits that the LLM uses to return chunks from the vector database. |
Retriever methods¶
The retriever you select defines how the LLM blueprint searches through, and retrieves, the most relevant chunks from the vector database. They inform which information is provided to the language model. Select one of the following methods:
Method | Description |
---|---|
Single-Lookup Retriever | Performs a single vector database lookup for each query and returns the most similar documents. |
Conversational Retriever (default) | Rewrites the query based on chat history, returning context-aware responses. In other words, this retriever functions similarly to the Single-Lookup Retriever with the addition of query rewrite as its first step. |
Multi-Step Retriever | Performs the following steps when returning results:
|
Use Add Neighbor Chunks to control whether to add neighboring chunks within the vector database to the chunks that the similarity search retrieves. When enabled, the retriever returns i
, i-1
, and i+1
(for example, if the query retrieves chunk number 42, chunks 41 and 43 are also retrieved).
Notice also that only the primary chunk has a similarity score. This is because the neighbor chunks are added, not calculated, as part of the response.
Also known as context window expansion or context enrichment, this technique includes surrounding chunks adjacent to the retrieved chunk to provide more complete context. Some reasons to enable this include:
- A single chunk may be cut off mid-sentence or may miss important context.
- Related information might span multiple chunks.
- The response might require context from surrounding or chunks.
Enter a value to set the Retrieval limits , which controls the number of returned documents.
The value you set for Top K (nearest neighbors) instructs the LLM on how many relevant chunks to retrieve from the vector database. Chunk selection is based on similarity scores. Consider:
- Larger values provide more comprehensive coverage but also require more processing overhead and may include less relevant results.
- Smaller values provide more focused results and faster processing, but may miss relevant information.
Max tokens specifies:
- The maximum size (in tokens) of each text chunk extracted from the dataset when building the vector database.
- The length of the text that is used to create embeddings.
- The size of the citations used in RAG operations.
Set prompting strategy¶
The prompting strategy is where you configure context (chat history) settings and optionally add a system prompt.
Set context state¶
There are two states of context. They control whether chat history is sent with the prompt to include relevant context for responses.
State | Description |
---|---|
Context-aware | When sending input, previous chat history is included with the prompt. This state is the default. |
No context | Sends each prompt as independent input, without history from the chat. |
You can switch between one-time (no context) and context-aware within a chat. They each become independent sets of history context—going from context-aware, to no context, and back to aware clears the earlier history from the prompt. (This only happens once a new prompt is submitted.)
Context state is reported in two ways:
-
A badge, which displays to the right of the LLM blueprint name in both configuration and comparison views, reports the current context state:
-
In the configuration view, dividers show the state of the context setting:
Set system prompt¶
The system prompt, an optional field, is a "universal" prompt prepended to all individual prompts for this LLM blueprint. It instructs and formats the LLM response. The system prompt can impact the structure, tone, format, and content that is created during the generation of the response.
See an example of system prompt application in the Comparison documentation.
Actions for LLM blueprints¶
The actions available for an LLM blueprint can be accessed from the actions menu next to the name in the left-hand Comparison panel or from LLM blueprint actions in a selected LLM blueprint.
Option | Description |
---|---|
Configure LLM blueprint | From the Comparison panel only. Opens the configuration settings for the selected blueprint for further tuning. |
Edit LLM blueprint | Provides a modal for changing the LLM blueprint name. Changing the name saves the new name and all saved settings. If any settings have not been saved, they will revert to the last saved version. |
Copy to new LLM blueprint | Creates a new LLM blueprint from all saved settings of the selected blueprint. |
Send to model workshop | Sends the LLM blueprint to the Registry where it is added to the model workshop. From there it can be deployed as a custom model. |
Delete LLM blueprint | Deletes the LLM blueprint. |
Copy LLM blueprint¶
You can make a copy of an existing LLM blueprint to inherit the settings. Using this approach makes sense when you want to compare slightly different blueprints or use a blueprint you did not create in a shared playground.
You can make a copy in one of two ways:
-
From an existing blueprint in the left-hand panel, click the Actions menu and elect Copy to new LLM blueprint to create a new copy that inherits the settings of the parent blueprint.
The new LLM blueprint opens for further configuration. Optionally, choose LLM blueprint actions to change the name.
-
From any open LLM blueprint, choose LLM blueprint actions and choose Copy to new LLM blueprint.
Change LLM blueprint configuration¶
To change the configuration of an LLM blueprint, choose Configure LLM blueprint from the actions menu in the Comparison panel. The LLM blueprint configuration and chat history display. Change any of the configuration settings and Save configuration
When you make changes to an LLM blueprint, the chat history associated with it, if the configuration is context-aware, is also saved. All the prompts within a chat persist through LLM blueprint changes:
- When you submit a prompt, the history included is everything within the most recent chat context.
- If you switch the LLM blueprint to No context, each prompt is its own chat context.
- If you switch back to Context-aware, that starts a new chat context within the chat.
Note that chats in the configuration view are separate from chats in the Comparison view—the histories don't mingle.