Simulations & Runs

What is a simulation?

A simulation is one scenario running against your agent inside an isolated container. It produces a transcript — a record of the actor’s messages, your agent’s responses, every API call made to services, and the actor’s internal reasoning. The transcript is what gets graded afterward.

During a simulation a fresh container starts with your agent plus its declared services; the services seed data based on the scenario; the actor initiates; your agent responds (potentially calling services); and they continue until the actor’s objectives are met or the turn budget runs out.

Set actor.config.MAX_TURNS in veris.yaml to control the turn budget. For one-shot agents, set it to 1.

What is a run?

A run is a batch of simulations. When you start a run, every scenario in the selected scenario set becomes one simulation, and they execute in parallel.

Creating a run

Interactive:


veris simulations create

Prompts for an environment and scenario set.

With flags:


veris simulations create \
  --scenario-set-id scenset_abc123 \
  --env-id env_xyz \
  --simulation-timeout 300

Flag	Description
`--scenario-set-id`	Scenario set to run
`--env-id`	Environment to use
`--image-tag`	Image tag (default: `latest`)
`--simulation-timeout`	Per-simulation timeout in seconds (60–3600)
`--auto-evaluate` / `--no-auto-evaluate`	Auto-run the grader once simulations finish (default: on)

With --auto-evaluate on (the default), the run’s grader starts as soon as the last simulation completes — no separate veris evaluations create step needed.

Monitoring progress


# One-time status
veris simulations status <RUN_ID>
 
# Watch mode (polls every 3 seconds)
veris simulations status <RUN_ID> --watch
 
# Include the event stream
veris simulations status <RUN_ID> --watch --log

In the console, the Simulations page shows each run’s progress bar, per-simulation status, and timing. Click into a simulation to see the full transcript: messages between the actor and agent, the actor’s internal reasoning, every API call with request/response payloads, and the raw events.

Cancelling


veris simulations cancel <RUN_ID>

Stops all pending and running simulations in the run.

CLI Commands


# Create a run
veris simulations create [--scenario-set-id ID] [--env-id ID]
 
# List runs
veris simulations list [--status STATUS] [--env-id ID]
 
# Check status
veris simulations status <RUN_ID> [--watch] [--log]
 
# Cancel
veris simulations cancel <RUN_ID>