Veris

Veris runs your AI agent end-to-end against a simulated version of its production environment. That means both sides of the agent:

what talks to the agent — a human user chatting with it, a webhook from another system, a scheduled job, or a handoff from another agent. Veris drives any of these with realistic goals, context, and triggering payloads.
what the agent talks to — Slack, Stripe, Salesforce, Calendar, Postgres, and more. Veris intercepts outbound API calls and routes them to LLM-powered mocks that respond with scenario-appropriate data.

You push your agent as a container. Veris generates scenarios, drives them through the agent, grades the transcripts, and produces a report. Your agent runs the same code it runs in production. No wrappers, no simulation-only branches.

Agents fail in ways unit tests can’t catch — bad tool calls, wrong decisions mid-task, regressions after a prompt tweak. Veris is the loop for finding and fixing those before they hit production.

The same sandbox is used for the dev loop, CI regression gating, RL and SFT training, and regulatory QA. Integration is the one path you have to walk first; every use case flows from there.

Primitives

Environment — your agent packaged as a container, plus the services it’s allowed to talk to.
Scenario — a test case: the input that starts the agent (a simulated user with goals, or a triggering event like a webhook or scheduled job), the context it runs in, and what counts as success.
Simulation — one scenario executed end-to-end against your agent, in an isolated container.
Evaluation — graders that score each simulation transcript against its success criteria.
Report — root-cause analysis across a batch of simulations, with concrete fix suggestions.

What the loop looks like


# Package and push your agent
veris env push
 
# Generate test cases from your agent's code
veris scenarios create
 
# Simulate, evaluate, report
veris run

This just shows the shape of the workflow. For a working “run it on my agent” path, see the Quickstart.

Start here

If you haven’t already, open the console and click an example agent — that’s the fastest “see it work” moment. Once you’ve seen it run there, come back and do the same with your own agent:

Quickstart — point your coding agent at the integration skill, or walk through the config yourself. ~15–30 minutes to your first report.