Skip to Content

Reports

What is a Report?

A report goes beyond raw evaluation scores. It analyzes evaluation results alongside your agent’s code to perform root cause analysis — identifying why failures happen, not just that they happened.

Reports answer questions like:

  • What are the most common failure modes across all simulations?
  • What is the root cause of each failure pattern?
  • Which specific prompts, tools, or code paths need to change?
  • What concrete steps would have the most impact on agent quality?

How Reports Work

How Reports Work

The report engine takes two inputs:

  1. Evaluation results — per-simulation grading scores, assertion outcomes, and findings
  2. Agent code — your agent’s source code, prompts, and tool definitions

By cross-referencing failure patterns in evaluations with the actual code, the report engine can pinpoint specific issues and generate targeted fixes.

Report Contents

Reports are generated as HTML documents that include:

Executive Summary

Overall pass/fail rates, key metrics, and a high-level assessment of agent quality across all simulations in the evaluation run.

Failure Analysis

Categorized breakdown of failure modes with examples from specific simulations. Each failure is traced back to its root cause:

  • Prompt issues — Missing instructions, ambiguous directives, conflicting guidance
  • Tool usage problems — Wrong API calls, missing parameters, incorrect sequencing
  • Context management — Lost context between turns, failure to track state
  • Harness gaps — Missing error handling, inadequate fallback logic
  • Knowledge gaps — Agent lacks information needed to complete the task

Recommendations

Actionable suggestions organized by impact and effort. Each recommendation includes:

  • What to change — specific file, function, or prompt section
  • Why — the root cause it addresses, with examples from failed simulations
  • How — concrete code or prompt changes, including diffs where applicable

Recommendations cover:

  • Prompt rewrites and additions
  • Tool configuration changes
  • Harness improvements (error handling, retries, validation)
  • Context management strategies
  • Missing capability identification

Auto-Apply

Recommendations can be automatically applied to your agent’s code. In the Console, click the Apply button next to a recommendation to have the changes made directly. This creates a new version of your agent that you can immediately test with another simulation run.

Auto-apply modifies your agent code based on the recommendation. Review the changes before pushing a new version. You can always revert by pushing a previous image tag.

Generating a Report

From the CLI

# Interactive mode — prompts for evaluation run veris reports create # With explicit flags veris reports create --eval-run-id eval_abc123 # Check status veris reports status REPORT_ID --watch # Download as HTML veris reports get REPORT_ID -o report.html

From the Console

Navigate to Reports and click Generate Report. Select the completed run and evaluation run, then confirm. Once complete, click the report to view it in the browser. Recommendations with auto-apply buttons appear inline.

Iterating on Results

Reports close the feedback loop:

  1. Read the report to understand failure root causes
  2. Apply recommendations (manually or with auto-apply)
  3. Push a new version: veris env push --tag v1.1
  4. Re-run simulations and evaluations
  5. Generate a new report and compare with the previous one

Over time, you build a history of reports that tracks your agent’s improvement across versions.

CLI Commands

# Generate a report veris reports create [--env-id ID] [--eval-run-id ID] # List all reports veris reports list [--env-id ID] # Check report status veris reports status REPORT_ID [--watch] # Download report veris reports get REPORT_ID [-o OUTPUT_FILE]