Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Playgrounds

A playground, another type of Use Case asset, is the space for creating and interacting with LLM blueprints, comparing the response of each to determine which to use in production to solve a business problem. LLM blueprints represent the full context for what is needed to generate a response from an LLM, the resulting output is what can then be compared within the playground. This information is captured in the LLM blueprint settings.

You can use playgrounds with or without a vector database. Multiple playgrounds can exist in one Use Case and multiple LLM blueprints can live within a single playground.

The simplified workflow for working with playgrounds is as follows:

  1. Add a playground.
  2. Set the LLM blueprint configuration, including the base LLM and, optionally, a system prompt and vector database.
  3. Chat to test and tune the LLM blueprint.
  4. Build additional LLM blueprints.
  5. (Optional) Use the playground's Comparison tab to compare LLM blueprints side-by-side.

Add a playground

To add a playground, first create a Use Case and then either:

  • Click Add > Playground from the dropdown.
  • Select the Playground tab and click the Add playground button. This button is only available for the first playground added to a Use Case; use the Add dropdown for subsequent playgrounds.

When you create a playground, the playground opens with a draft LLM blueprint ready for configuration. From the playground you have access to all the controls for creating LLM blueprint drafts, interacting with and fine-tuning them, and saving them for comparison and potential future deployment.

Note

The playground is named, by default, Playground <timestamp>. You can change the name from the Use Case directory by choosing Edit playground info in the actions menu.

Following are the elements of the playground:

Element Description
1 Canvas selector Sets the workspace to be either for LLM blueprint development or comparison.
2 Create blueprint draft Leaves the current screen and opens a clean canvas for creating a new blueprint draft.
3 Display controls Opens modals to sort, group, and filter drafts and LLM blueprints in the playground.
4 Blueprint controls Provides access to actions; choices are dependent on blueprint status (draft or saved).
5 Configuration Sets first the LLM to use with the LLM blueprint and then, once selected, the associated configuration parameters, including adding a vector database.
6 Prompt Input box for entering the prompt that is sent to the LLM blueprint or draft.

The middle section is where DataRobot returns LLM responses.

Once playgrounds are created, you can switch between them using the breadcrumbs dropdown:

Set the configuration

Whenever you create a new draft LLM blueprint, the first step in building it is to select the base LLM. Do this from the Configuration section of the playground.

DataRobot provides a selection of LLMs, with availability dependent on your cluster and account type. Select a base LLM to expose additional configuration options:

Setting Description
System prompt The system prompt, an optional field, is a "universal" prompt prepended to all individual prompts. It instructs and formats the LLM response. The system prompt can impact the structure, tone, format, and content that is created during the generation of the response.
Max completion tokens The maximum number of tokens allowed in the completion. The combined count of this value and prompt tokens must be below the model’s maximum context size, where prompt token count is comprised of system prompt, user prompt, recent chat history, and vector database citations.
Temperature The temperature controls the randomness of model output. Enter a value (range is LLM-dependent), where higher values return more diverse output and lower values return more deterministic results. A value of 0 may return repetitive results. Temperature is an alternative to Top P for controlling the token selection in the output (see the example below).
Top P Token selection probability cutoff (Top P) sets a threshold that controls the selection of words included in the response based on a cumulative probability cutoff for token selection. For example, 0.2 considers only the top 20% probability mass. Higher numbers return more diverse options for outputs. Top P is an alternative to Temperature for controlling the token selection in the output (see the example below).
Vector database An optional field that identifies a database comprised of a collection of chunks of unstructured text and corresponding text embeddings for each chunk, indexed for easy retrieval.
Temperature or Top P?

Consider prompting: “To make the perfect ice cream sundae, top 2 scoops of vanilla ice cream with… “. The desired responses for a suggested next word might be hot fudge, pineapple sauce, and bacon. To increase the probability of what is returned:

  • For bacon, set Temperature to the maximum value and leave top P at the default. Setting Top P with a high Temperature, increases the probability of fudge and pineapple and reduces the probability of bacon.
  • For hot fudge, set temperature to 0.

Each base LLM has default configuration settings. As a result, the only required selection before starting to chat is to choose the LLM.

Chatting

Note

The selected LLM is locked in after you start chatting. To try a different LLM, create a new LLM blueprint draft from the original and select a different LLM.

Chatting is the activity of sending prompts and receiving a response from the LLM. Once you have set the configuration for your LLM, send it prompts (from the entry box in the lower center panel) to determine whether further refinements are needed before saving your draft as an LLM blueprint.

Chatting within the playground is a "conversation"—you can ask follow up questions with subsequent prompts. Following is an example of asking the LLM to provide Python code for running DataRobot Autopilot:

You could then ask it to make a change to "that code" and the LLM responds knowing the code being referenced because it is "aware" of the previous conversation history:

Use the playground to test and tune prompts until you are satisfied with the system prompt and settings. Then, click Save as LLM blueprint in the bottom of the right-hand panel.

See also the section on few-shot prompting.

Confidence scores

Confidence scores are computed using the the factual consistency metric approach, where a similarity score is computed using the facts retrieved from the vector database, and the generated text from the LLM blueprint. The similarity metric used is the ROUGE-1. DataRobot GenAI uses an improved version of ROUGE-1 based on insights from "The limits of automatic summarization according to ROUGE".

Build LLM blueprints

LLM blueprints start as drafts and then are saved as a final, deployable model. Once saved, they cannot be modified, but they can be copied so that modifications can be made to create a new LLM blueprint. Drafts are labeled with a badge to indicate their status:

Create a draft

There are two methods for creating a draft:

  • From the left-hand LLM blueprint panel, click Create blueprint draft to add a new, untitled draft that is ready for configuration.

  • With an existing blueprint selected in the left-hand panel, select LLM blueprint actions > Copy to new draft to create a new draft ("Copy of...") that inherits the settings of the parent blueprint.

    Click on the name in the center panel to change the name:

After fine-tuning the draft, click Save as LLM blueprint in the bottom of the right-hand panel. Once multiple LLM blueprints are saved, use the playground's Comparison tab to compare them side-by-side.

Actions for blueprints

The actions available for a blueprint depend on its status of draft or saved. They can be accessed from the three dots next to the name in the left-hand panel or from:

The Draft actions dropdown.

Or the LLM blueprint actions dropdown:

Option Description
LLM blueprint actions
Copy to new draft Copies all settings from the LLM blueprint to a new draft ("Copy of..."). Chat history is not copied.
Register LLM blueprint Sends the LLM blueprint to the Registry where it is added to the custom model workshop. From there it can be deployed as a custom model.
Delete LLM blueprint Deletes the saved blueprint.
Draft actions
Save as LLM blueprint Saves the draft as an LLM blueprint. Once saved, no further changes can be made to the blueprint. To make changes, use the Copy to new draft option for saved blueprints.
Copy to new draft Copies all existing settings to a new draft without making changes to the current draft.
Delete draft Deletes the draft blueprint.

Display controls

The left-hand panel of the playground lists all saved and draft LLM blueprints. Use the controls to modify the display:

The Filter option controls which blueprints are listed in the panel, either by base LLM or status:

The small number to the right of Filter label indicates how many blueprints are displayed as a result of any (or no) applied filtering.

Sort by controls the ordering of the blueprints. It is additive, meaning that it is applied on top of any filtering or grouping in place:

Group by, also additive, arranges the display by the selected criteria. Labels indicate the group "name" with numbers to indicate the number of member blueprints.

Deep dive: Few-shot prompting

Few-shot prompting is a technique for generating or classifying text based on a limited number of examples or prompts—"in-context learning." The examples, or "shots," condition a model to follow patterns in the provided context; it can then generate coherent and contextually relevant text even if it has never seen similar examples during training. This is in contrast to traditional machine learning, where models typically require a large amount of labeled training data. Few-shot prompting makes the model a good candidate for tasks like text generation, text summarization, translation, question-answering, and sentiment analysis without requiring fine-tuning on a specific dataset.

A simple example of few-shot prompting is use in categorizing customer feedback as positive or negative. By showing the model three examples of positive and negative feedback, when the model sees unclassified feedback it can assign a rating based on the first three examples. Few-shot prompting is when you show the model 2 or more examples; zero-shot and one-shot prompting are similar techniques.

The following shows use of few-shot prompting in DataRobot. In the system prompt field, provide a prompt and some examples for learning:

Given text in a customer support ticket text, determine the name of the product it refers to, as well as the issue type. The issue type can be "hardware" or "software". Format the response as JSON with two keys, "product" and "issue type".

---------------
Examples:

Input: I'm encountering a bug in TPS Report Generator Enterprise Edition. Whenever I click "Generate", the application crashes. Are there any updates or fixes available?
Output: {"product": "TPS Report Generator Enterprise Edition", "issue_type": "software"}

Input: The screen is flickering on my Acme Phone 5+, and I'm unable to use it. What should I do? I want to install a few games and performed a factory reset, hoping it would resolve the problem, but it didn't help.
Output: {"product": "Acme Phone 5+", "issue_type": "hardware"}
---------------

After providing the LLM with that context, try some example prompts:

Prompt: I've noticed a peculiar error message popping up on my PrintPro 9000 screen. It says "PC LOAD LETTER". What does it mean?

Prompt: I cannot install firmware v12.1 on my Print Pro 9002. It says "Incompatible product version".

Prompt: My PrintPro 9001 is making strange noises and not functioning properly. Can you please help me with this?

Save the draft as an LLM blueprint to register it and put it into production.

See the MIT Prompt Engineering Guide for more detailed information.


Updated April 14, 2024