Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Chatting

Chatting is the activity of sending prompts and receiving a response from the LLM. A chat is a collection of chat prompts. Once you have set the configuration for your LLM, send it prompts (from the entry box in the lower part of the panel) to determine whether further refinements are needed before considering your LLM blueprint for deployment.

Chatting within the playground is a "conversation"—you can ask follow-up questions with subsequent prompts. Following is an example of asking the LLM to provide Python code for running DataRobot Autopilot:

The results of the follow-up questions are dependent on whether context awareness is enabled (see continuation of the example). Use the playground to test and tune prompts until you are satisfied with the system prompt and settings. Then, click Save configuration in the bottom of the right-hand panel.

Context-aware chatting

When configuring an LLM blueprint, you set the history awareness in the Prompting tab.

There are two states of context. They control whether chat history is sent with the prompt to include relevant context for responses.

State Description
Context-aware When sending input, previous chat history is included with the prompt. This state is the default.
No context Sends each prompt as independent input, without history from the chat.

Note

Consider the context state and how it functions in conjunction with the selected retriever method.

You can switch between one-time (no context) and context-aware within a chat. They each become independent sets of history context—going from context-aware, to no context, and back to aware clears the earlier history from the prompt. (This only happens once a new prompt is submitted.)

Context state is reported in two ways:

  1. A badge, which displays to the right of the LLM blueprint name in both configuration and comparison views, reports the current context state:

  2. In the configuration view, dividers show the state of the context setting:

Using the example above, you could then prompt to make a change to "that code." With context-aware enabled, the LLM responds knowing the code being referenced because it is "aware" of the previous conversation history:

See the prompting reference for information on crafting optimized prompts (including few-shot prompting).

Single vs comparison chats

Chatting with a single LLM blueprint is a good way to tune before starting prompt comparisons with other LLM blueprints. Comparison lets you compare responses between LLM blueprints to help decide which to move to production.

Note

You can only do comparison prompting with LLM blueprints that you created. To see the results of prompting another user’s LLM blueprint in a shared Use Case, copy the blueprint and then you can chat with the same settings applied. This is intentional behavior because prompting a an LLM blueprint impacts the chat history, which can impact the responses that are generated. However, you can provide response feedback to assist development.

Single LLM blueprint chat

When you first configure an LLM blueprint, part of the creation process includes chatting. Set the configuration, and save, to activate chatting:

After seeing chat results, tune the configuration, if desired, and prompt again. Use the additional actions available within each chat result to retrieve more information and the prompt:

Option Description
View configuration Shows the configuration used by that prompt in the Configuration panel on the right. If you haven't changed configurations while chatting, no change is apparent. Using this tool allows you to recall previous settings and restore the LLM blueprint to those settings.
Open tracing Opens the tracing log, which shows all components and prompting activity used in generating LLM responses.
Delete prompt and response Removes both the prompt and response from the chat history. If deleted, they are no longer considered as context for future responses.

As you send prompts to the LLM, DataRobot maintains a record of those chats. You can either add to the context of an existing chat or start a new chat, which does not carry over any of the context from other chats in the history:

Starting a new chat allows you to have multiple independent conversation threads with a single blueprint. In this way, you can evaluate the LLM blueprint based on different types of topics, without bringing in the history of the previous prompt response, which could "pollute" the answers. While you could also do this by switching context off, submitting a prompt, and then switching it back on, starting a new chat is a simpler solution.

Click Start new chat to begin with a clean history; DataRobot will rename the chat from New chat to the words from your prompt once the prompt is submitted.

Comparison LLM blueprint chat

Once you are satisfied, click Comparison in the breadcrumbs to compare responses with other LLM blueprints.

If you determine that further tuning is needed after having started a comparison, you can still modify the configuration of individual LLM blueprints:

To compare LLM blueprint chats side-by-side, see the LLM blueprint comparison documentation.

Response feedback

Use the response feedback "thumbs" to rate the prompt answer. Responses are recorded in the Tracing, tab User feedback column. The response, as part of the exported feedback sent to the AI Catalog, can be used, for example, to train a predictive model.

Citations

A citation is a metric and is on by default (as are Latency, Prompt Tokens, and Response Tokens). Citations provide a list of the top reference document chunks, based on relevance to the prompt, retrieved from the vector database. Be aware that the embedding model used to create the vector database in the first place can affect the quality of the citations retrieved.

Note

Citations only appear when the LLM blueprint being queried has an associated vector database. While citations are one of the available metrics, you do not need the assessment functionality enabled to have citations returned.

Use citations as a safety check to validate LLM responses. While they help to validate LLM responses, citations also allow you to validate proper and appropriate retrieval from the vector database—are you retrieving the chunks from your docs that you want to provide as context to the LLM? Additionally, if you enable the Faithfulness metric, which measures whether the LLM response matches the source, it relies on the citation output for its relevance.

ROUGE scores

ROUGE scores, also known as confidence scores, calculates the distance between the response generated from an LLM blueprint and the documents retrieved from the vector database. They indicate how close the response is to the provided context. ROUGE scores are computed using the factual consistency metric approach, where a score is computed using the facts retrieved from the vector database and the generated text from the LLM blueprint. The similarity metric used is the ROUGE-1 (Recall-Oriented Understudy for Gisting Evaluation) metric. DataRobot GenAI uses an improved version of ROUGE-1 based on insights from "The limits of automatic summarization according to ROUGE". The ROUGE scoring algorithm is not scaled; instead, DataRobot uses heuristic coefficients.

The ROUGE score is reported in the prompt response:

Similarity score

The similarity score is based on the distance of the query embedding to the chunk embedding in the vector space; the larger the similarity score, the better. Do not use this score to compare results between vector databases, only to compare values for retrievals within a single vector databases. The score informs "this chunk is more similar than that chunk to the given query."

The similarity score is reported in the citation:

How is the similarity score calculated?

TLDR: The similarity score displayed in the citation is based on the Hamming distance returned by the binary index during vector search, rescored to cosine similarities with the float embeddings of the user query, and then rounded to two decimals.

Similarity scores are based on the distance scores that Facebook AI Similarity Search (FAISS) returns along the indices as a result of the vector search. These binary indices use the Hamming distance as their distance function to find the top_k nearest vectors for a given query vector. Instead of using the raw distance scores (which are integers because the vectors in the binary indexes are binary quantized, DataRobot uses a rescoring method adopted from sentence transformers:

  1. Retrieve rescore_multiplier * top_k results with the binary query embedding and the binary document embeddings (i.e., the list of the first k results of the binary retrieval).

  2. Rescore that list of binary document embeddings with the initial (before they got quantized) float query embeddings.

    Rescoring is performing a dot product operation between float vectors, returning cosine similarite (multiply corresponding elements of the vectors, sum the results to produce a single scalar value). Applying this rescoring step preserves total retrieval performance, reduces memory and disk space usage, and improves the retrieval speed.

  3. Finally, DataRobot rounds the scores to two digits because of floating point arithmetic precision issues caused by numpy, binary quantization, and the rescoring method.

Because rescoring is performing a dot product with the embeddings, which leads to a cosine similarity, the higher the value, the better.

Metadata filtering

Use metadata filtering to limit the citations returned by the prompt query. When configured, the LLM blueprint only returns chunks that include the specified metadata column-value pair. You can add a filter for each metadata column, as needed. Each metadata column can be paired with a single value.

Note

Vector databases created before the introduction of metadata filtering do not support this feature. To use filtering with them, create a version from the original and configure the LLM blueprint to use the new vector database instead.

To create a metadata filter, click Filter metadata below the prompt entry box.

All optional metadata column names appear in the dropdown, as well as the option source, which is content from the document_file_path column. If the vector database includes no optional metadata, only source is available for selection. Select a colum name and enter a single value. The value must be an exact string match from the vector database (partial matches are not allowed). Use Add filter to add filters for different metadata columns

Note

If the value entered for a source is not an exact match, while a response is returned there are no citations available. This is because there was not match on the filter.

If the LLM blueprint configuration does not include a vector database, clicking Filter metadata displays the following:

To enable filtering, add a vector database that includes a minimum of the required document and document_file_path (shown as source in filtering) columns.

Metadata filtering example

The following example, which uses the DataRobot documentation as the vector database, compares results to the same prompt ("how do I deploy a model?") with and without a metadata filter applied.

The image on the left has no filtering. The image on the right set the value of source to source: datarobot_english_documentation/datarobot_docs|en|mlops|deployment|deploy-methods|deploy-model.txt. This is a value in the document_file_path column of the data source. Notice the differences, particularly in prompt tokens and ROUGE score.

When you open citations for the filtered prompt, you can see the source is only the one path:


Updated January 30, 2025