Skip to Content
Core ConceptsScenarios & Graders

Scenarios & Graders

What is a Scenario?

A scenario is a test case that describes a realistic interaction between simulated actors and your agent. It defines:

  • Who the actors are (personas with roles, objectives, and knowledge)
  • What they want to achieve (concrete objectives)
  • What the environment looks like (external context like holidays or outages)
  • How success is measured (assertions — path-agnostic boolean checks)

Scenario Schema

Scenarios follow the ScenarioContent model:

scenario_id: refund_request title: Handle a Double-Charge Refund description: > Tests the agent's ability to identify a duplicate charge, verify it against the billing system, and process a refund. actors: - name: Maria Chen type: persona role: > A frustrated customer who was charged twice for the same restaurant order. She is direct and expects a quick resolution. Has a debit card and a credit card on file. objectives: - objective: Get the duplicate charge identified and confirmed - objective: Receive a refund for the duplicate charge knowledge: > Knows her user ID. Knows the restaurant name and approximate date of the charge. Does NOT know the transaction IDs. environment: description: > The billing system is experiencing intermittent slowness due to a scheduled maintenance window. initial_turn: actor: Maria Chen action: > Hi, I just noticed I was charged twice for the same meal at Bella Cucina last Thursday. Can you help me sort this out? briefs: salesforce: > Customer Maria Chen (user ID: MC-4892) has two charges of $47.50 from Bella Cucina on March 5. One is the original charge, one is a duplicate. She has a Visa debit card ending in 3847.

Schema Reference

ScenarioContent

FieldTypeDescription
titlestringConcise title of the scenario
descriptionstringShort (1-2 sentence) description of what the scenario tests. No step-by-step flows.
actorslistList of actors involved in the scenario
environmentobject (optional)External context unrelated to any actor — holidays, market conditions, system outages
initial_turnobject (optional)The opening message that kicks off the scenario

Actor

FieldTypeDescription
namestringFull name of the actor
type"persona" or "entity"persona for simulated humans, entity for non-human participants
rolestringCharacter description — demeanor, profession, emotional state, relevant context. Not a behavioral script.
objectiveslistConcrete goals the actor pursues, in rough order
knowledgestring (optional)What the actor knows at the start — categories of information, not actual values

Objective

FieldTypeDescription
objectivestringA concrete goal the actor is trying to achieve

Environment

FieldTypeDescription
descriptionstringExternal context like holidays, market conditions, system outages

Initial Turn

FieldTypeDescription
actorstringName of the actor taking the first turn (must match an actor name)
actionstringThe opening message or action

Briefs

The briefs field (on the full Scenario model) contains pre-generated seed data keyed by service or actor name. These are sent to mock services during the seed phase so they generate data consistent with the scenario.

Assertions

Assertions define success for a scenario as independent boolean checks. Each assertion is path-agnostic — it checks the outcome, not the steps taken.

assertions: - Duplicate charge was identified in the billing system - Refund was processed for the correct amount - Customer was informed of the expected refund timeline

Assertions are evaluated after the simulation completes. Each statement is checked against the simulation transcript and service state.

FieldTypeDescription
statementslist of strings2-3 independent boolean checks that define success

Write assertions as task-oriented outcomes: “Service history was retrieved,” “Customer was informed of the refund policy.” Avoid describing the steps the agent should take.

Scenario Sets

Scenarios are organized into scenario sets — collections generated or uploaded together. When you run veris scenarios generate, the result is a scenario set containing multiple scenarios, along with matching graders and assertions.

What is a Grader?

A grader is an evaluation function that reads simulation transcripts and scores agent performance. Graders support two evaluator types:

Python Evaluator

Executes a Python function against the trace:

def grade(trace): """trace is a list of dicts (agent trace entries).""" tool_calls = [e for e in trace if e.get("type") == "tool_call"] return 1.0 if len(tool_calls) > 0 else 0.0

LLM-as-a-Judge (Score Model)

Uses an LLM to evaluate the trace with Jinja-templated prompts:

type: score_model model: gpt-4o-2024-08-06 messages: - role: system content: | Evaluate whether the agent hallucinated any information. Trace: {{ trace }}

Composite Graders

Graders can be composed into a tree using type: multi. Children run concurrently, then a calculate_output function aggregates results.

Generating Scenarios

CLI:

veris scenarios generate veris scenarios generate --num 10

Console: Navigate to Scenarios & Graders and click Generate. Select the number of scenarios and confirm. The generation process analyzes your agent’s code, identifies capabilities and service integrations, and produces scenarios that exercise different code paths and edge cases.

You can also create scenarios in the Console via Scenarios & Graders → New Scenario.

CLI Commands

# List scenario sets veris scenarios list # View a scenario set veris scenarios get SCENARIO_SET_ID # Generate scenarios and graders veris scenarios generate # List available graders veris eval list

Using the Console

The Scenarios & Graders page lists all scenario sets with their title, status, scenario count, and associated graders. Click a scenario set to view individual scenarios and their details. Grader names appear as badges — click to view grader configuration.