Scenarios & Graders
What is a Scenario?
A scenario is a test case that describes a realistic interaction between simulated actors and your agent. It defines:
- Who the actors are (personas with roles, objectives, and knowledge)
- What they want to achieve (concrete objectives)
- What the environment looks like (external context like holidays or outages)
- How success is measured (assertions — path-agnostic boolean checks)
Scenario Schema
Scenarios follow the ScenarioContent model:
scenario_id: refund_request
title: Handle a Double-Charge Refund
description: >
Tests the agent's ability to identify a duplicate charge,
verify it against the billing system, and process a refund.
actors:
- name: Maria Chen
type: persona
role: >
A frustrated customer who was charged twice for the same
restaurant order. She is direct and expects a quick resolution.
Has a debit card and a credit card on file.
objectives:
- objective: Get the duplicate charge identified and confirmed
- objective: Receive a refund for the duplicate charge
knowledge: >
Knows her user ID. Knows the restaurant name and approximate
date of the charge. Does NOT know the transaction IDs.
environment:
description: >
The billing system is experiencing intermittent slowness
due to a scheduled maintenance window.
initial_turn:
actor: Maria Chen
action: >
Hi, I just noticed I was charged twice for the same meal
at Bella Cucina last Thursday. Can you help me sort this out?
briefs:
salesforce: >
Customer Maria Chen (user ID: MC-4892) has two charges of $47.50
from Bella Cucina on March 5. One is the original charge, one is
a duplicate. She has a Visa debit card ending in 3847.Schema Reference
ScenarioContent
| Field | Type | Description |
|---|---|---|
title | string | Concise title of the scenario |
description | string | Short (1-2 sentence) description of what the scenario tests. No step-by-step flows. |
actors | list | List of actors involved in the scenario |
environment | object (optional) | External context unrelated to any actor — holidays, market conditions, system outages |
initial_turn | object (optional) | The opening message that kicks off the scenario |
Actor
| Field | Type | Description |
|---|---|---|
name | string | Full name of the actor |
type | "persona" or "entity" | persona for simulated humans, entity for non-human participants |
role | string | Character description — demeanor, profession, emotional state, relevant context. Not a behavioral script. |
objectives | list | Concrete goals the actor pursues, in rough order |
knowledge | string (optional) | What the actor knows at the start — categories of information, not actual values |
Objective
| Field | Type | Description |
|---|---|---|
objective | string | A concrete goal the actor is trying to achieve |
Environment
| Field | Type | Description |
|---|---|---|
description | string | External context like holidays, market conditions, system outages |
Initial Turn
| Field | Type | Description |
|---|---|---|
actor | string | Name of the actor taking the first turn (must match an actor name) |
action | string | The opening message or action |
Briefs
The briefs field (on the full Scenario model) contains pre-generated seed data keyed by service or actor name. These are sent to mock services during the seed phase so they generate data consistent with the scenario.
Assertions
Assertions define success for a scenario as independent boolean checks. Each assertion is path-agnostic — it checks the outcome, not the steps taken.
assertions:
- Duplicate charge was identified in the billing system
- Refund was processed for the correct amount
- Customer was informed of the expected refund timelineAssertions are evaluated after the simulation completes. Each statement is checked against the simulation transcript and service state.
| Field | Type | Description |
|---|---|---|
statements | list of strings | 2-3 independent boolean checks that define success |
Write assertions as task-oriented outcomes: “Service history was retrieved,” “Customer was informed of the refund policy.” Avoid describing the steps the agent should take.
Scenario Sets
Scenarios are organized into scenario sets — collections generated or uploaded together. When you run veris scenarios generate, the result is a scenario set containing multiple scenarios, along with matching graders and assertions.
What is a Grader?
A grader is an evaluation function that reads simulation transcripts and scores agent performance. Graders support two evaluator types:
Python Evaluator
Executes a Python function against the trace:
def grade(trace):
"""trace is a list of dicts (agent trace entries)."""
tool_calls = [e for e in trace if e.get("type") == "tool_call"]
return 1.0 if len(tool_calls) > 0 else 0.0LLM-as-a-Judge (Score Model)
Uses an LLM to evaluate the trace with Jinja-templated prompts:
type: score_model
model: gpt-4o-2024-08-06
messages:
- role: system
content: |
Evaluate whether the agent hallucinated any information.
Trace: {{ trace }}Composite Graders
Graders can be composed into a tree using type: multi. Children run concurrently, then a calculate_output function aggregates results.
Generating Scenarios
CLI:
veris scenarios generate
veris scenarios generate --num 10Console: Navigate to Scenarios & Graders and click Generate. Select the number of scenarios and confirm. The generation process analyzes your agent’s code, identifies capabilities and service integrations, and produces scenarios that exercise different code paths and edge cases.
You can also create scenarios in the Console via Scenarios & Graders → New Scenario.
CLI Commands
# List scenario sets
veris scenarios list
# View a scenario set
veris scenarios get SCENARIO_SET_ID
# Generate scenarios and graders
veris scenarios generate
# List available graders
veris eval listUsing the Console
The Scenarios & Graders page lists all scenario sets with their title, status, scenario count, and associated graders. Click a scenario set to view individual scenarios and their details. Grader names appear as badges — click to view grader configuration.