# Tensile - Enhanced agent reliability through automated test

> Tensile - Enhanced agent reliability through automated test - Use DataRobot's test-driven
> development framework to improve the reliability, task performance, and policy adherence of AI
> agents with trajectory logging, replay, and contextual hints.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-04-24T16:03:56.267909+00:00` (UTC).

## Primary page

- [Tensile - Enhanced agent reliability through automated test](https://docs.datarobot.com/en/docs/api/dev-learning/accelerators/llm-and-genai-apps/tensile-agent-reliability.html): Full documentation for this topic (HTML).

## Sections on this page

- [Prerequisites](https://docs.datarobot.com/en/docs/api/dev-learning/accelerators/llm-and-genai-apps/tensile-agent-reliability.html#prerequisites): In-page section heading.
- [Instrument an agent for trajectory logging](https://docs.datarobot.com/en/docs/api/dev-learning/accelerators/llm-and-genai-apps/tensile-agent-reliability.html#instrument-agent): In-page section heading.
- [Analyze trajectories and evaluate testable moments](https://docs.datarobot.com/en/docs/api/dev-learning/accelerators/llm-and-genai-apps/tensile-agent-reliability.html#analyze-and-evaluate): In-page section heading.
- [Replay trajectories](https://docs.datarobot.com/en/docs/api/dev-learning/accelerators/llm-and-genai-apps/tensile-agent-reliability.html#replay-trajectories): In-page section heading.
- [Configuration](https://docs.datarobot.com/en/docs/api/dev-learning/accelerators/llm-and-genai-apps/tensile-agent-reliability.html#configuration): In-page section heading.
- [DataRobot LLM gateway](https://docs.datarobot.com/en/docs/api/dev-learning/accelerators/llm-and-genai-apps/tensile-agent-reliability.html#datarobot-llm-gateway): In-page section heading.
- [Clustering](https://docs.datarobot.com/en/docs/api/dev-learning/accelerators/llm-and-genai-apps/tensile-agent-reliability.html#clustering): In-page section heading.
- [Clustering app](https://docs.datarobot.com/en/docs/api/dev-learning/accelerators/llm-and-genai-apps/tensile-agent-reliability.html#clustering-app): In-page section heading.
- [Clustering-based hint injection](https://docs.datarobot.com/en/docs/api/dev-learning/accelerators/llm-and-genai-apps/tensile-agent-reliability.html#clustering-based-hint-injection): In-page section heading.
- [Trajectory Analyzer workflow](https://docs.datarobot.com/en/docs/api/dev-learning/accelerators/llm-and-genai-apps/tensile-agent-reliability.html#trajectory-analyzer): In-page section heading.

## Related documentation

- [Developer documentation](https://docs.datarobot.com/en/docs/api/index.html): Linked from this page.
- [Developer learning](https://docs.datarobot.com/en/docs/api/dev-learning/index.html): Linked from this page.
- [AI accelerators](https://docs.datarobot.com/en/docs/api/dev-learning/accelerators/index.html): Linked from this page.

## Documentation content

# Tensile: Enhanced agent reliability through automated test

Building and maintaining reliable AI agents is challenging. Agents must stay on-task, follow policies, and recover from failures in a way you can measure and improve. This accelerator introduces Tensile, a test-driven development framework in DataRobot for improving the reliability, task performance, and policy adherence of AI agents through automated test synthesis and trajectory analysis.

Tensile helps you instrument agents, capture execution trajectories, and turn successes and failures into repeatable tests. You can then evaluate and replay runs, compare system prompt changes, and use clustering and contextual hint injection to remediate issues iteratively.

In this accelerator you will:

- Instrument an agent with TrajectoryLogger to record execution trajectories.
- Analyze trajectories to identify testable moments (successes and failures).
- Evaluate and replay runs to quantify improvements and compare system prompt changes.
- Configure Tensile with the DataRobot LLM gateway.
- Use clustering (Dash app and ClusteringHintInjector ) to explore issues and inject contextual hints.
- Apply the Trajectory Analyzer workflow with ProgrammaticHintInjector for iterative improvement.

## Prerequisites

Before running the accelerator, ensure you have:

- Tensile installed (see the quickstart below).
- A config.yaml with LLM and trajectory settings.
- For DataRobot: set DATAROBOT_API_TOKEN in test.env (or in your environment). Optionally set DATAROBOT_LLM_GATEWAY_URL and DATAROBOT_TRACE_CONTEXT for observability.

Quickstart from the project root:

```
uv venv --python 3.13
uv sync; pre-commit install
uv pip install -e .
cp config.yaml.sample config.yaml   # And fill in credentials
tensile   # show help
```

## Instrument an agent for trajectory logging

Use `TrajectoryLogger` as the transport for an `httpx` client, then pass that client into your OpenAI-compatible agent. Trajectories are written to `<trajectory_dir>/<subdir>` (with `trajectory_dir` in `config.yaml`).

```
from tensile.logging import TrajectoryLogger

http_client = httpx.AsyncClient(
    transport=TrajectoryLogger(
        httpx.AsyncHTTPTransport(),
        trajectory_subdir=<subdir> | None
    )
)
client = AsyncOpenAI(
    api_key=api_key,
    base_url=f"{endpoint_url}/v1",
    http_client=http_client,
)
```

## Analyze trajectories and evaluate testable moments

Run the analysis pipeline (outputs to `analysis_output/` by default):

```
tensile analyze <trajectory_file>
```

To run testable moments manually (for example, 10 times):

```
tensile test <moment_path> -n 10
```

## Replay trajectories

Replay steps in a trajectory to collect new LLM responses, spot flukes, or compare behavior after system prompt changes. Omit `output_path` to write to `<trajectory_file>.replay.jsonl`.

```
tensile replay <trajectory_file> [output_path]
tensile replay <trajectory_file> --num-replays 5
tensile replay <trajectory_file> --num-replays 3 --max-concurrency 10
tensile replay <trajectory_file> --num-replays 3 --system-prompt-path <system_prompt_path_txt>

# Examples
tensile replay <trajectory_file>
tensile replay <trajectory_file> -n 5
tensile replay <trajectory_file> output/replay.jsonl -n 3
```

## Configuration

### DataRobot LLM gateway

Add the following to your `config.yaml` to use the DataRobot LLM gateway:

```
# config.yaml
llm:
  name: "<model_name>"       # e.g., vertex_ai/gemini-3-pro-preview
  api_base: "<llm_gateway_url>"
  api_key: "<your_api_token>"
```

## Clustering

### Clustering app

Start the Dash app to explore and cluster analysis outputs in the browser. It requires the `dev` dependency group; with `uv`, run:

```
task dev-env
task apps:clustering
```

### Clustering-based hint injection

Use `ClusteringHintInjector` with `analysis_dirs` and `trajectories_dirs` pointing at your Tensile outputs and a report store ( `InMemoryReportStore` or `FileSystemReportStore`). Example:

```
from pathlib import Path

import httpx
from openai import AsyncOpenAI

from tensile.logging.hint_injector import (
    ClusteringHintConfig,
    ClusteringHintInjector,
    InMemoryReportStore,
    SentenceTransformersEmbeddingBackend,
)

base_transport = httpx.AsyncHTTPTransport()
embedding_backend = SentenceTransformersEmbeddingBackend(
    model_name="<embedding_model_name>",
)
report_store = InMemoryReportStore()
config = ClusteringHintConfig(
    analysis_dirs=[Path("analysis_output")],
    trajectories_dirs=[Path("trajectories")],
)

hinting_transport = ClusteringHintInjector(
    base_transport,
    embedding_backend=embedding_backend,
    report_store=report_store,
    config=config,
)

http_client = httpx.AsyncClient(transport=hinting_transport)
client = AsyncOpenAI(
    api_key=api_key,
    base_url=f"{endpoint_url}/v1",
    http_client=http_client,
)
```

## Trajectory Analyzer workflow

1. Instrument the agent with ProgrammaticHintInjector and TrajectoryLogger :

```
from tensile.logging import TrajectoryLogger
from tensile.logging.hint_injector.programmatic_hint_injector import ProgrammaticHintInjector

http_client = httpx.AsyncClient(
    transport=ProgrammaticHintInjector(
        wrapped=TrajectoryLogger(
            wrapped=httpx.AsyncHTTPTransport(),
            trajectory_subdir=<subdir>,
        ),
        hint_file_path=None,
    )
)

# It's recommended to start with hint_file_path=None until a hint file is generated by the analyzer
```

1. Run the agent to produce a trajectory.
2. Run tensile analyze <trajectory_path> . When analysis finishes, copy the generated hints.json , updated system prompt, and/or updated tool definitions back into your agent.
3. Set hint_file_path to the path of the hints.json file and run the agent again to produce a new trajectory.
4. Run tensile analyze <new_traj_path> --hints-file <path_to_hints.json> to re-analyze with the new hints.
5. Repeat until behavior converges.
