Reports
What is a Report?
A report goes beyond raw evaluation scores. It analyzes evaluation results alongside your agent’s code to perform root cause analysis — identifying why failures happen, not just that they happened.
Reports answer questions like:
- What are the most common failure modes across all simulations?
- What is the root cause of each failure pattern?
- Which specific prompts, tools, or code paths need to change?
- What concrete steps would have the most impact on agent quality?
How Reports Work
The report engine takes two inputs:
- Evaluation results — per-simulation grading scores, assertion outcomes, and findings
- Agent code — your agent’s source code, prompts, and tool definitions
By cross-referencing failure patterns in evaluations with the actual code, the report engine can pinpoint specific issues and generate targeted fixes.
Report Contents
Reports are generated as HTML documents that include:
Executive Summary
Overall pass/fail rates, key metrics, and a high-level assessment of agent quality across all simulations in the evaluation run.
Failure Analysis
Categorized breakdown of failure modes with examples from specific simulations. Each failure is traced back to its root cause:
- Prompt issues — Missing instructions, ambiguous directives, conflicting guidance
- Tool usage problems — Wrong API calls, missing parameters, incorrect sequencing
- Context management — Lost context between turns, failure to track state
- Harness gaps — Missing error handling, inadequate fallback logic
- Knowledge gaps — Agent lacks information needed to complete the task
Recommendations
Actionable suggestions organized by impact and effort. Each recommendation includes:
- What to change — specific file, function, or prompt section
- Why — the root cause it addresses, with examples from failed simulations
- How — concrete code or prompt changes, including diffs where applicable
Recommendations cover:
- Prompt rewrites and additions
- Tool configuration changes
- Harness improvements (error handling, retries, validation)
- Context management strategies
- Missing capability identification
Auto-Apply
Recommendations can be automatically applied to your agent’s code. In the Console, click the Apply button next to a recommendation to have the changes made directly. This creates a new version of your agent that you can immediately test with another simulation run.
Auto-apply modifies your agent code based on the recommendation. Review the changes before pushing a new version. You can always revert by pushing a previous image tag.
Generating a Report
From the CLI
# Interactive mode — prompts for evaluation run
veris reports create
# With explicit flags
veris reports create --eval-run-id eval_abc123
# Check status
veris reports status REPORT_ID --watch
# Download as HTML
veris reports get REPORT_ID -o report.htmlFrom the Console
Navigate to Reports and click Generate Report. Select the completed run and evaluation run, then confirm. Once complete, click the report to view it in the browser. Recommendations with auto-apply buttons appear inline.
Iterating on Results
Reports close the feedback loop:
- Read the report to understand failure root causes
- Apply recommendations (manually or with auto-apply)
- Push a new version:
veris env push --tag v1.1 - Re-run simulations and evaluations
- Generate a new report and compare with the previous one
Over time, you build a history of reports that tracks your agent’s improvement across versions.
CLI Commands
# Generate a report
veris reports create [--env-id ID] [--eval-run-id ID]
# List all reports
veris reports list [--env-id ID]
# Check report status
veris reports status REPORT_ID [--watch]
# Download report
veris reports get REPORT_ID [-o OUTPUT_FILE]