Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.


Chatting is the activity of sending prompts and receiving a response from the LLM. A chat is a collection of chat prompts. Once you have set the configuration for your LLM, send it prompts (from the entry box in the lower part of the panel) to determine whether further refinements are needed before considering your LLM blueprint for deployment.

Chatting within the playground is a "conversation"—you can ask follow-up questions with subsequent prompts. Following is an example of asking the LLM to provide Python code for running DataRobot Autopilot:

The results of the follow-up questions are dependent on whether context awareness is enabled (see continuation of the example). Use the playground to test and tune prompts until you are satisfied with the system prompt and settings. Then, click Save configuration in the bottom of the right-hand panel.

Context-aware chatting

When configuring an LLM blueprint, you set the history awareness in the Prompting tab.

There are two states of context. They control whether chat history is sent with the prompt to include relevant context for responses.

State Description
Context-aware When sending input, previous chat history is included with the prompt. This state is the default.
No context Sends each prompt as independent input, without history from the chat.

You can switch between one-time (no context) and context-aware within a chat. They each become independent sets of history context—going from context-aware, to no context, and back to aware clears the earlier history from the prompt. (This only happens once a new prompt is submitted.)

Context state is reported in two ways:

  1. A badge, which displays to the right of the LLM blueprint name in both configuration and comparison views, reports the current context state:

  2. In the configuration view, dividers show the state of the context setting:

Using the example above, you could then prompt to make a change to "that code." With context-aware enabled, the LLM responds knowing the code being referenced because it is "aware" of the previous conversation history:

See the prompting reference for information on crafting optimized prompts (including few-shot prompting).

Single vs comparison chats

Chatting with a single LLM blueprint is a good way to tune before starting prompt comparisons with other LLM blueprints. Comparison lets you compare responses between LLM blueprints to help decide which to move to production.


You can only do comparison prompting with LLM blueprints that you created. To see the results of prompting another user’s LLM blueprint in a shared Use Case, copy the blueprint and then you can chat with the same settings applied. This is intentional behavior because prompting a an LLM blueprint impacts the chat history, which can impact the responses that are generated. However, you can provide response feedback to assist development.

Single LLM blueprint chat

When you first configure an LLM blueprint, part of the creation process includes chatting. Set the configuration, and save, to activate chatting:

After seeing chat results, tune the configuration, if desired, and prompt again. Use the additional actions available within each chat result to retrieve more information and the prompt:

Option Description
View configuration Shows the configuration used by that prompt in the Configuration panel on the right. If you haven't changed configurations while chatting, no change is apparent. Using this tool allows you to recall previous settings and restore the LLM blueprint to those settings.
Open tracing Opens the tracing log, which shows all components and prompting activity used in generating LLM responses.
Delete prompt and response Removes both the prompt and response from the chat history. If deleted, they are no longer considered as context for future responses.

As you send prompts to the LLM, DataRobot maintains a record of those chats. You can either add to the context of an existing chat or start a new chat, which does not carry over any of the context from other chats in the history:

Starting a new chat allows you to have multiple independent conversation threads with a single blueprint. In this way, you can evaluate the LLM blueprint based on different types of topics, without bringing in the history of the previous prompt response, which could "pollute" the answers. While you could also do this by switching context off, submitting a prompt, and then switching it back on, starting a new chat is a simpler solution.

Click Start new chat to begin with a clean history; DataRobot will rename the chat from New chat to the words from your prompt once the prompt is submitted.

Comparison LLM blueprint chat

Once you are satisfied, click Comparison in the breadcrumbs to compare responses with other LLM blueprints.

If you determine that further tuning is needed after having started a comparison, you can still modify the configuration of individual LLM blueprints:

To compare LLM blueprint chats side-by-side, see the LLM blueprint comparison documentation.

Response feedback

Use the response feedback "thumbs" to rate the prompt answer. Responses are recorded in the Tracing, tab User feedback column. The response, as part of the exported feedback sent to the AI Catalog, can be used, for example, to train a predictive model.


A citation is a metric and is on by default (as are Latency, Prompt Tokens, and Response Tokens). Citations provide a list of the top reference document chunks, based on relevance to the prompt, retrieved from VDB. Be aware that the embedding model used to create the VDB in the first place can affect the quality of the citations retrieved.


Citations only appear when the LLM blueprint being queried has an associated VDB. While citations are one of the available metrics, you do not need the assessment functionality enabled to have citations returned.

Use citations as a safety check to validate LLM responses. While they help to validate LLM responses, citations also allow you to validate proper and appropriate retrieval from the VDB—are you retrieving the chunks from your docs that you want to provide as context to the LLM? Additionally, if you enable the Faithfulness metric, which measures whether the LLM response matches the source, it relies on the citation output for its relevance.

Confidence scores

Confidence scores are computed using the the factual consistency metric approach, where a similarity score is computed using the facts retrieved from the vector database, and the generated text from the LLM blueprint. The similarity metric used is the ROUGE-1. DataRobot GenAI uses an improved version of ROUGE-1 based on insights from "The limits of automatic summarization according to ROUGE".

Updated May 14, 2024