Skip to Content
Core ConceptsScenarios & Graders

Scenarios & Graders

What is a Scenario?

A scenario packages one specific situation you want to put your agent into, plus what “handled it correctly” means for that situation. Think of it like a briefing you’d hand someone walking into a case cold — the state of things right now, what’s unfolding, and what a good resolution looks like:

  • Starting state — what the mock services know when the simulation starts (Maria Chen has two $47.50 charges from Bella Cucina on March 5), plus any broader context (a payment-processor outage, a market holiday). This is the world the agent wakes up into.
  • Actor — whoever triggers the agent and keeps the interaction going. A persona with objectives (a customer pursuing a refund), an incoming webhook payload, a scheduled job. For multi-turn scenarios the actor drives the conversation toward its objectives until they’re met or the turn budget runs out.
  • Success criteria — boolean checks evaluated against the transcript and final service state. They define what “handled correctly” means for this scenario.

One scenario executed end-to-end against your agent is a simulation.

Assertions

Assertions define what “success” means for an individual scenario — two or three independent boolean checks, evaluated after the simulation completes.

Scenario sets

Scenarios are organized into scenario sets — collections generated or uploaded together. veris scenarios create produces a scenario set containing multiple scenarios plus a grader that matches them.

Graders

Where assertions check scenario-specific outcomes, a grader checks general agent behaviors across a scenario set — hallucination, tool-use correctness, communication quality, and similar patterns. Graders read the same simulation trace assertions do; they just score things that aren’t tied to any one scenario.

Generating scenarios

veris scenarios create veris scenarios create --num 10 veris scenarios status <SET_ID> --watch

Generation analyzes your agent’s code, identifies its capabilities and service integrations, and produces scenarios that exercise different code paths and edge cases — plus a grader tuned to the set.

Scenario types

When generating from the console, you can bias the distribution toward one type:

TypeWhat it covers
Mixed (default)A natural distribution across the other types
SimpleStraightforward happy-path interactions
ComplexMulti-step, multi-service, or multi-actor interactions
Error HandlingService failures, bad data, tool errors the agent needs to recover from
Edge CaseUnusual or rare-but-legitimate situations
AdversarialDeceptive, hostile, or agent-breaking actor behavior
Out of ScopeRequests the agent shouldn’t or can’t handle

The CLI’s veris scenarios create today uses the default Mixed distribution.

Sources

Beyond the default code analysis, generation can draw from other sources:

  • Prompt / Docs — describe the scenarios you want in plain language, optionally attaching reference documents (a policy PDF, a runbook). Generation grounds the set in your description and any docs you attach.
  • Production traces — turn your agent’s real usage into scenarios. See below.

From production traces

If you already run your agent in production behind a tracing provider, you can generate scenarios straight from real usage: the actual things users ask, where the agent struggled, and the language people really use. You point Veris at the traces you want by pasting the same request you’d run against your provider’s API.

Langfuse

The Langfuse API gives you four endpoints, depending on what you’re fetching:

  1. Open the API reference for the endpoint you want and use its Test Request panel to build your query (filter by name, tags, time range), then run it to confirm it returns what you expect.
  2. Copy the curl command the page generates.
  3. In the console, add a From Traces source and paste the curl into the request box.

Veris runs the query, pulls the sessions, and mines them into scenarios that capture the intents behind real conversations, the failures worth catching, and how users actually phrase things.

CLI Commands

# Generate a scenario set and a matching grader veris scenarios create [--num N] [--env-id ID] # Check generation progress veris scenarios status <SET_ID> [--watch] # List scenario sets veris scenarios list [--env-id ID] # Open in console veris scenarios get <SET_ID> # Delete a scenario set veris scenarios delete <SET_ID>

Using the console

The Scenarios page lists all scenario sets with their title, status, scenario count, and the grader mapped to the set. Click a scenario set to view individual scenarios and their details.