Tensile: Enhanced agent reliability through automated test¶
Building and maintaining reliable AI agents is challenging. Agents must stay on-task, follow policies, and recover from failures in a way you can measure and improve. This accelerator introduces Tensile, a test-driven development framework in DataRobot for improving the reliability, task performance, and policy adherence of AI agents through automated test synthesis and trajectory analysis.
Tensile helps you instrument agents, capture execution trajectories, and turn successes and failures into repeatable tests. You can then evaluate and replay runs, compare system prompt changes, and use clustering and contextual hint injection to remediate issues iteratively.
In this accelerator you will:
- Instrument an agent with
TrajectoryLoggerto record execution trajectories. - Analyze trajectories to identify testable moments (successes and failures).
- Evaluate and replay runs to quantify improvements and compare system prompt changes.
- Configure Tensile with the DataRobot LLM gateway.
- Use clustering (Dash app and
ClusteringHintInjector) to explore issues and inject contextual hints. - Apply the Trajectory Analyzer workflow with
ProgrammaticHintInjectorfor iterative improvement.
Prerequisites¶
Before running the accelerator, ensure you have:
- Tensile installed (see the quickstart below).
- A
config.yamlwith LLM and trajectory settings. - For DataRobot: set
DATAROBOT_API_TOKENintest.env(or in your environment). Optionally setDATAROBOT_LLM_GATEWAY_URLandDATAROBOT_TRACE_CONTEXTfor observability.
Quickstart from the project root:
uv venv --python 3.13
uv sync; pre-commit install
uv pip install -e .
cp config.yaml.sample config.yaml # And fill in credentials
tensile # show help
Instrument an agent for trajectory logging¶
Use TrajectoryLogger as the transport for an httpx client, then pass that client into your OpenAI-compatible agent. Trajectories are written to <trajectory_dir>/<subdir> (with trajectory_dir in config.yaml).
from tensile.logging import TrajectoryLogger
http_client = httpx.AsyncClient(
transport=TrajectoryLogger(
httpx.AsyncHTTPTransport(),
trajectory_subdir=<subdir> | None
)
)
client = AsyncOpenAI(
api_key=api_key,
base_url=f"{endpoint_url}/v1",
http_client=http_client,
)
Analyze trajectories and evaluate testable moments¶
Run the analysis pipeline (outputs to analysis_output/ by default):
tensile analyze <trajectory_file>
To run testable moments manually (for example, 10 times):
tensile test <moment_path> -n 10
Replay trajectories¶
Replay steps in a trajectory to collect new LLM responses, spot flukes, or compare behavior after system prompt changes. Omit output_path to write to <trajectory_file>.replay.jsonl.
tensile replay <trajectory_file> [output_path]
tensile replay <trajectory_file> --num-replays 5
tensile replay <trajectory_file> --num-replays 3 --max-concurrency 10
tensile replay <trajectory_file> --num-replays 3 --system-prompt-path <system_prompt_path_txt>
# Examples
tensile replay <trajectory_file>
tensile replay <trajectory_file> -n 5
tensile replay <trajectory_file> output/replay.jsonl -n 3
Configuration¶
DataRobot LLM gateway¶
Add the following to your config.yaml to use the DataRobot LLM gateway:
# config.yaml
llm:
name: "<model_name>" # e.g., vertex_ai/gemini-3-pro-preview
api_base: "<llm_gateway_url>"
api_key: "<your_api_token>"
Clustering¶
Clustering app¶
Start the Dash app to explore and cluster analysis outputs in the browser. It requires the dev dependency group; with uv, run:
task dev-env
task apps:clustering
Clustering-based hint injection¶
Use ClusteringHintInjector with analysis_dirs and trajectories_dirs pointing at your Tensile outputs and a report store (InMemoryReportStore or FileSystemReportStore). Example:
from pathlib import Path
import httpx
from openai import AsyncOpenAI
from tensile.logging.hint_injector import (
ClusteringHintConfig,
ClusteringHintInjector,
InMemoryReportStore,
SentenceTransformersEmbeddingBackend,
)
base_transport = httpx.AsyncHTTPTransport()
embedding_backend = SentenceTransformersEmbeddingBackend(
model_name="<embedding_model_name>",
)
report_store = InMemoryReportStore()
config = ClusteringHintConfig(
analysis_dirs=[Path("analysis_output")],
trajectories_dirs=[Path("trajectories")],
)
hinting_transport = ClusteringHintInjector(
base_transport,
embedding_backend=embedding_backend,
report_store=report_store,
config=config,
)
http_client = httpx.AsyncClient(transport=hinting_transport)
client = AsyncOpenAI(
api_key=api_key,
base_url=f"{endpoint_url}/v1",
http_client=http_client,
)
Trajectory Analyzer workflow¶
- Instrument the agent with
ProgrammaticHintInjectorandTrajectoryLogger:
from tensile.logging import TrajectoryLogger
from tensile.logging.hint_injector.programmatic_hint_injector import ProgrammaticHintInjector
http_client = httpx.AsyncClient(
transport=ProgrammaticHintInjector(
wrapped=TrajectoryLogger(
wrapped=httpx.AsyncHTTPTransport(),
trajectory_subdir=<subdir>,
),
hint_file_path=None,
)
)
# It's recommended to start with hint_file_path=None until a hint file is generated by the analyzer
- Run the agent to produce a trajectory.
- Run
tensile analyze <trajectory_path>. When analysis finishes, copy the generatedhints.json, updated system prompt, and/or updated tool definitions back into your agent. - Set
hint_file_pathto the path of thehints.jsonfile and run the agent again to produce a new trajectory. - Run
tensile analyze <new_traj_path> --hints-file <path_to_hints.json>to re-analyze with the new hints. - Repeat until behavior converges.