Scenarios & Graders
What is a Scenario?
A scenario packages one specific situation you want to put your agent into, plus what “handled it correctly” means for that situation. Think of it like a briefing you’d hand someone walking into a case cold — the state of things right now, what’s unfolding, and what a good resolution looks like:
- Starting state — what the mock services know when the simulation starts (Maria Chen has two $47.50 charges from Bella Cucina on March 5), plus any broader context (a payment-processor outage, a market holiday). This is the world the agent wakes up into.
- Actor — whoever triggers the agent and keeps the interaction going. A persona with objectives (a customer pursuing a refund), an incoming webhook payload, a scheduled job. For multi-turn scenarios the actor drives the conversation toward its objectives until they’re met or the turn budget runs out.
- Success criteria — boolean checks evaluated against the transcript and final service state. They define what “handled correctly” means for this scenario.
One scenario executed end-to-end against your agent is a simulation.
Assertions
Assertions define what “success” means for an individual scenario — two or three independent boolean checks, evaluated after the simulation completes.
Scenario sets
Scenarios are organized into scenario sets — collections generated or uploaded together. veris scenarios create produces a scenario set containing multiple scenarios plus a grader that matches them.
Graders
Where assertions check scenario-specific outcomes, a grader checks general agent behaviors across a scenario set — hallucination, tool-use correctness, communication quality, and similar patterns. Graders read the same simulation trace assertions do; they just score things that aren’t tied to any one scenario.
Generating scenarios
veris scenarios create
veris scenarios create --num 10
veris scenarios status <SET_ID> --watchGeneration analyzes your agent’s code, identifies its capabilities and service integrations, and produces scenarios that exercise different code paths and edge cases — plus a grader tuned to the set.
Scenario types
When generating from the console, you can bias the distribution toward one type:
| Type | What it covers |
|---|---|
| Mixed (default) | A natural distribution across the other types |
| Simple | Straightforward happy-path interactions |
| Complex | Multi-step, multi-service, or multi-actor interactions |
| Error Handling | Service failures, bad data, tool errors the agent needs to recover from |
| Edge Case | Unusual or rare-but-legitimate situations |
| Adversarial | Deceptive, hostile, or agent-breaking actor behavior |
| Out of Scope | Requests the agent shouldn’t or can’t handle |
The CLI’s veris scenarios create today uses the default Mixed distribution.
Sources
Beyond the default code analysis, generation can draw from other sources:
- Prompt / Docs — describe the scenarios you want in plain language, optionally attaching reference documents (a policy PDF, a runbook). Generation grounds the set in your description and any docs you attach.
- Production traces — turn your agent’s real usage into scenarios. See below.
From production traces
If you already run your agent in production behind a tracing provider, you can generate scenarios straight from real usage: the actual things users ask, where the agent struggled, and the language people really use. You point Veris at the traces you want by pasting the same request you’d run against your provider’s API.
Langfuse
The Langfuse API gives you four endpoints, depending on what you’re fetching:
- A filtered set — list traces or list sessions
- A single record by id — trace or session
- Open the API reference for the endpoint you want and use its Test Request panel to build your query (filter by name, tags, time range), then run it to confirm it returns what you expect.
- Copy the curl command the page generates.
- In the console, add a From Traces source and paste the curl into the request box.
Veris runs the query, pulls the sessions, and mines them into scenarios that capture the intents behind real conversations, the failures worth catching, and how users actually phrase things.
CLI Commands
# Generate a scenario set and a matching grader
veris scenarios create [--num N] [--env-id ID]
# Check generation progress
veris scenarios status <SET_ID> [--watch]
# List scenario sets
veris scenarios list [--env-id ID]
# Open in console
veris scenarios get <SET_ID>
# Delete a scenario set
veris scenarios delete <SET_ID>Using the console
The Scenarios page lists all scenario sets with their title, status, scenario count, and the grader mapped to the set. Click a scenario set to view individual scenarios and their details.