Reports

What is a Report?

A report does root cause analysis on an evaluation run. Where an evaluation scores each simulation individually, a report looks across the whole run to find the patterns driving failures — and recommends concrete edits to fix them.

Reports answer:

What are the most common failure modes across the run?
What’s driving each pattern?
Which prompts, tool docstrings, or tool behaviors need to change?
What specific edit would have the most impact?

Report Contents

Each report has four sections:

Scenario pass rate

The headline number — how many of the run’s scenarios passed all their assertions, out of the total.

Grader results by category

Per-category scores across each of the grader’s categories (information gathering, tool execution, data accuracy, and so on), so you can see where the agent is strong and where it’s weak.

Top issues

A ranked list of the most impactful issues surfaced by the grader, each with its category and how many simulations it affected.

Suggested fixes

Concrete changes the report recommends, grouped by where they apply:

System Prompt Fix — edits to your agent’s system prompt.
Tool Docstring Fix — changes to a tool’s description so the model uses it more correctly.
Tool Code Fix — changes to the tool’s implementation.

Each fix has a title, a short description, how many simulations it would help (count and percentage), and a unified diff of the change. A Copy Prompt button grabs the fix as a prompt you can paste into your coding agent.

Generating a Report

From the CLI


# Interactive mode — prompts for evaluation run
veris reports create
 
# With explicit flags
veris reports create --eval-run-id eval_abc123
 
# Check status
veris reports status REPORT_ID --watch
 
# Download
veris reports get REPORT_ID -o report.html

From the Console

Navigate to Reports and click Generate Report. Pick a completed evaluation run and confirm. Once the report finishes, click it to view in the browser.

Iterating on Results

The typical loop:

Read the report to understand what’s driving failures.
Apply a suggested fix to your agent.
Push a new version: veris env push --tag v1.1
Re-run simulations and evaluations.
Generate a new report on the new run.

Over time you build a history of reports that tracks your agent’s quality across versions.

CLI Commands


# Generate a report
veris reports create [--env-id ID] [--eval-run-id ID]
 
# List all reports
veris reports list [--env-id ID]
 
# Check report status
veris reports status REPORT_ID [--watch]
 
# Download report
veris reports get REPORT_ID [-o OUTPUT_FILE]