Skip to content

Tensile: Enhanced agent reliability through automated test

Building and maintaining reliable AI agents is challenging. Agents must stay on-task, follow policies, and recover from failures in a way you can measure and improve. This accelerator introduces Tensile, a test-driven development framework in DataRobot for improving the reliability, task performance, and policy adherence of AI agents through automated test synthesis and trajectory analysis.

Tensile helps you instrument agents, capture execution trajectories, and turn successes and failures into repeatable tests. You can then evaluate and replay runs, compare system prompt changes, and use clustering and contextual hint injection to remediate issues iteratively.

In this accelerator you will:

  • Instrument an agent with TrajectoryLogger to record execution trajectories.
  • Analyze trajectories to identify testable moments (successes and failures).
  • Evaluate and replay runs to quantify improvements and compare system prompt changes.
  • Configure Tensile with the DataRobot LLM gateway.
  • Use clustering (Dash app and ClusteringHintInjector) to explore issues and inject contextual hints.
  • Apply the Trajectory Analyzer workflow with ProgrammaticHintInjector for iterative improvement.

Prerequisites

Before running the accelerator, ensure you have:

  • Tensile installed (see the quickstart below).
  • A config.yaml with LLM and trajectory settings.
  • For DataRobot: set DATAROBOT_API_TOKEN in test.env (or in your environment). Optionally set DATAROBOT_LLM_GATEWAY_URL and DATAROBOT_TRACE_CONTEXT for observability.

Quickstart from the project root:

uv venv --python 3.13
uv sync; pre-commit install
uv pip install -e .
cp config.yaml.sample config.yaml   # And fill in credentials
tensile   # show help

Instrument an agent for trajectory logging

Use TrajectoryLogger as the transport for an httpx client, then pass that client into your OpenAI-compatible agent. Trajectories are written to <trajectory_dir>/<subdir> (with trajectory_dir in config.yaml).

from tensile.logging import TrajectoryLogger

http_client = httpx.AsyncClient(
    transport=TrajectoryLogger(
        httpx.AsyncHTTPTransport(),
        trajectory_subdir=<subdir> | None
    )
)
client = AsyncOpenAI(
    api_key=api_key,
    base_url=f"{endpoint_url}/v1",
    http_client=http_client,
)

Analyze trajectories and evaluate testable moments

Run the analysis pipeline (outputs to analysis_output/ by default):

tensile analyze <trajectory_file>

To run testable moments manually (for example, 10 times):

tensile test <moment_path> -n 10

Replay trajectories

Replay steps in a trajectory to collect new LLM responses, spot flukes, or compare behavior after system prompt changes. Omit output_path to write to <trajectory_file>.replay.jsonl.

tensile replay <trajectory_file> [output_path]
tensile replay <trajectory_file> --num-replays 5
tensile replay <trajectory_file> --num-replays 3 --max-concurrency 10
tensile replay <trajectory_file> --num-replays 3 --system-prompt-path <system_prompt_path_txt>

# Examples
tensile replay <trajectory_file>
tensile replay <trajectory_file> -n 5
tensile replay <trajectory_file> output/replay.jsonl -n 3

Configuration

DataRobot LLM gateway

Add the following to your config.yaml to use the DataRobot LLM gateway:

# config.yaml
llm:
  name: "<model_name>"       # e.g., vertex_ai/gemini-3-pro-preview
  api_base: "<llm_gateway_url>"
  api_key: "<your_api_token>"

Clustering

Clustering app

Start the Dash app to explore and cluster analysis outputs in the browser. It requires the dev dependency group; with uv, run:

task dev-env
task apps:clustering

Clustering-based hint injection

Use ClusteringHintInjector with analysis_dirs and trajectories_dirs pointing at your Tensile outputs and a report store (InMemoryReportStore or FileSystemReportStore). Example:

from pathlib import Path

import httpx
from openai import AsyncOpenAI

from tensile.logging.hint_injector import (
    ClusteringHintConfig,
    ClusteringHintInjector,
    InMemoryReportStore,
    SentenceTransformersEmbeddingBackend,
)

base_transport = httpx.AsyncHTTPTransport()
embedding_backend = SentenceTransformersEmbeddingBackend(
    model_name="<embedding_model_name>",
)
report_store = InMemoryReportStore()
config = ClusteringHintConfig(
    analysis_dirs=[Path("analysis_output")],
    trajectories_dirs=[Path("trajectories")],
)

hinting_transport = ClusteringHintInjector(
    base_transport,
    embedding_backend=embedding_backend,
    report_store=report_store,
    config=config,
)

http_client = httpx.AsyncClient(transport=hinting_transport)
client = AsyncOpenAI(
    api_key=api_key,
    base_url=f"{endpoint_url}/v1",
    http_client=http_client,
)

Trajectory Analyzer workflow

  1. Instrument the agent with ProgrammaticHintInjector and TrajectoryLogger:
from tensile.logging import TrajectoryLogger
from tensile.logging.hint_injector.programmatic_hint_injector import ProgrammaticHintInjector

http_client = httpx.AsyncClient(
    transport=ProgrammaticHintInjector(
        wrapped=TrajectoryLogger(
            wrapped=httpx.AsyncHTTPTransport(),
            trajectory_subdir=<subdir>,
        ),
        hint_file_path=None,
    )
)

# It's recommended to start with hint_file_path=None until a hint file is generated by the analyzer
  1. Run the agent to produce a trajectory.
  2. Run tensile analyze <trajectory_path>. When analysis finishes, copy the generated hints.json, updated system prompt, and/or updated tool definitions back into your agent.
  3. Set hint_file_path to the path of the hints.json file and run the agent again to produce a new trajectory.
  4. Run tensile analyze <new_traj_path> --hints-file <path_to_hints.json> to re-analyze with the new hints.
  5. Repeat until behavior converges.