Full Walkthrough
A complete, step-by-step guide to testing your AI agent with Veris — from installation to your first evaluation report.
If you just want to get running quickly, see the Quickstart.
Prerequisites
- An AI agent with an HTTP, WebSocket, or email interface. This can be a chatbot, a customer support agent, an automation tool, or any agent that responds to user messages.
- Docker installed and running on your machine.
- Python 3.11+ (for the Veris CLI).
Step 1: Install the Veris CLI
The Veris CLI is the primary interface for packaging your agent, pushing it to the Veris cloud, and managing your test workflow.
pip install veris-cli
# or using uv (recommended for isolated tool installs)
uv tool install veris-cliVerify the installation:
veris --versionStep 2: Authenticate
Before you can interact with the Veris platform, you need to authenticate. The CLI supports two methods:
Browser login (recommended)
veris loginThis opens your browser for Google OAuth. After authorizing, the CLI stores your credentials locally at ~/.veris/config.yaml.
API key login (for CI/CD)
veris login YOUR_API_KEYUse this method for headless environments like CI pipelines.
The CLI supports profiles for managing multiple environments (dev, staging, prod). Use veris --profile staging login to configure a separate profile. See CLI Reference for details.
Step 3: Initialize Your Project
Navigate to your agent’s project directory and initialize it for Veris:
cd ~/my-agent
veris init --name "my-support-agent"This command does two things:
- Creates a
.veris/directory with three template files:Dockerfile.sandbox,veris.yaml, and.dockerignore. - Creates an environment on the Veris platform and saves its ID to
.veris/config.yaml. An environment is a named container that holds versioned images of your agent.
An environment in Veris is like a repository for your agent. It stores Docker image tags (versions) and associated configuration. You’ll see it listed on the Environments page in the Console.
Step 4: Configure veris.yaml
The .veris/veris.yaml file is the heart of your Veris configuration. It tells the sandbox three things:
- Which mock services your agent needs (Salesforce, Calendar, Stripe, etc.)
- How the simulated user communicates with your agent (HTTP, WebSocket, or email)
- How to start your agent inside the container
Here’s an example for an agent that uses Google Calendar and accepts HTTP requests:
services:
- name: calendar
dns_aliases:
- www.googleapis.com
- calendar.google.com
actor:
channels:
- type: http
url: http://localhost:8008
method: POST
headers:
Content-Type: application/json
request:
message_field: message
session_field: session_id
response:
type: json
message_field: response
agent:
code_path: /agent
entry_point: python -m app.main
port: 8008
environment:
GOOGLE_APPLICATION_CREDENTIALS: /certs/mock-service-account.jsonUnderstanding services
Each entry in services enables a mock API inside the sandbox. The dns_aliases field lists the domains your agent calls — these are intercepted via DNS and routed to the mock service instead of the real API. Your agent doesn’t need any code changes.
For example, if your agent calls https://www.googleapis.com/calendar/v3/..., the sandbox intercepts that request and routes it to the Calendar mock service, which uses an LLM to generate a contextually appropriate response based on the scenario.
Understanding the actor
The actor section defines how the simulated user (persona) communicates with your agent. The actor is an LLM-powered agent that follows the objectives defined in a scenario. It supports five communication channels:
- Chat (HTTP) — sends POST/GET requests to your agent’s chat API endpoint (JSON or SSE streaming)
- Email — sends emails that your agent processes asynchronously
- Voice — simulates phone/voice interactions with your agent
- Browser-use — drives a headless browser to interact with your agent’s web UI
- WebSocket — maintains a persistent connection for real-time messaging
The request and response fields tell the actor how to format messages and parse your agent’s replies. For SSE streaming responses, set response.type: sse and configure the event fields.
Understanding the agent block
The agent section tells the sandbox where your code lives and how to start it:
code_path— directory where your code is copied (default:/agent)entry_point— the command to start your agent (e.g.,python -m app.mainoruvicorn app:app)port— the port your agent listens on (default: 8008)environment— environment variables injected at runtime
See the full veris.yaml Reference for all options.
Step 5: Configure the Dockerfile
The .veris/Dockerfile.sandbox extends the Veris base image and adds your agent’s code and dependencies. The base image already contains all mock services, the simulation engine, and supporting infrastructure.
FROM us-central1-docker.pkg.dev/veris-ai-prod/veris-sandbox/veris-gvisor:latest
# Copy and install dependencies
COPY requirements.txt /agent/
RUN pip install -r /agent/requirements.txt
# Copy your agent code
COPY app /agent/app
# Return to the Veris app directory (required)
WORKDIR /appThe final WORKDIR /app is required. The Veris entrypoint script lives at /app and must be the working directory when the container starts. Your agent code should be placed at /agent (or the path specified in agent.code_path).
The base image uses Python 3.12 with uv pre-installed. If you use uv for dependency management:
FROM us-central1-docker.pkg.dev/veris-ai-prod/veris-sandbox/veris-gvisor:latest
COPY pyproject.toml uv.lock /agent/
WORKDIR /agent
RUN uv sync --frozen --no-dev
COPY app /agent/app
WORKDIR /appSee the Dockerfile Reference for more examples including Node.js agents and database schemas.
Step 6: Set Environment Variables
If your agent needs API keys or other secrets at runtime, set them as environment variables on the Veris platform. These are securely injected into the container at runtime and never embedded in your Docker image.
# Set secrets (encrypted at rest)
veris env set OPENAI_API_KEY=sk-... --secret
veris env set ANTHROPIC_API_KEY=sk-ant-... --secret
# Set non-secret config
veris env set LOG_LEVEL=infoYou can also define environment variables in the agent.environment section of veris.yaml for non-sensitive values. Runtime variables set with veris env set take precedence.
For local development with veris run local, create a .env file in your project root. The local runner loads it automatically.
Step 7: Push Your Agent
Build and push your agent image to the Veris registry:
veris env pushThis command:
- Creates a new image tag in your environment
- Builds the Docker image locally using your
Dockerfile.sandbox - Pushes the image to the Veris container registry
By default, images are tagged latest. Use --tag v1.0 to create named versions:
veris env push --tag v1.0If you prefer to build in the cloud (useful for CI or Apple Silicon Macs):
veris env push --remoteAfter pushing, your environment will show the new tag on the Environments page and the image will appear on the Images page.
Step 8: Generate Scenarios & Graders
Scenarios are test cases that define:
- Persona — who is the simulated user? (name, background, personality)
- Objectives — what does the user want to accomplish?
- Context — what data should mock services have? (e.g., “User has 3 meetings tomorrow”)
- Success criteria — how do we know the agent succeeded?
Graders are evaluation criteria that analyze simulation transcripts for specific failure modes (hallucination, incorrect tool usage, poor communication, etc.).
Veris can auto-generate both scenarios and graders by analyzing your agent’s code:
veris scenarios generateThis launches an async job that explores your agent’s codebase and produces a scenario set (a collection of scenarios) along with matching graders. You can monitor progress with:
veris scenarios listYou can also generate scenarios from the Console by navigating to Scenarios → Generate.
Auto-generation works best when your agent’s code is well-structured with clear endpoint handlers and service integration patterns. The generator uses Claude to analyze your code and produce realistic test scenarios.
Step 9: Run Simulations
Create a run to execute your scenarios:
veris run createThe interactive prompt asks you to select a scenario set and set concurrency (how many simulations run in parallel). You can also provide these as flags:
veris run create \
--scenario-set-id ss_abc123 \
--concurrency 10Each scenario in the set becomes one simulation. During a simulation:
- A fresh container starts with your agent and mock services
- The mock services seed their data based on the scenario context
- The actor (simulated user) sends the first message based on the persona’s objectives
- Your agent processes the message, potentially calling mock services
- The actor evaluates the response, decides on the next action, and continues
- The simulation ends when the actor completes all objectives or
max_turnsis reached
Monitor progress:
# CLI - watch mode polls every 3 seconds
veris run status RUN_ID --watch
# Or view real-time progress in the Console
# Navigate to Simulations → click on your runRunning locally
For faster iteration during development, you can run simulations locally without pushing to the cloud:
# Run all scenarios in the scenarios/ directory
veris run local
# Run a specific scenario
veris run local schedule_meeting
# Skip rebuilding the image
veris run local --skip-buildLocal runs use Docker on your machine and load environment variables from a .env file in your project root.
Step 10: Evaluate Results
After simulations complete, run graders against the transcripts:
veris evaluation-runs createThe interactive prompt asks which completed run and which grader to use. Graders analyze each simulation transcript and produce scores and findings for categories like:
- Hallucination detection (did the agent fabricate information?)
- Tool execution verification (did the agent actually call the right APIs?)
- Communication quality (was the agent clear and helpful?)
- Procedural correctness (did the agent follow the right steps?)
- Success criteria fulfillment (were the user’s goals achieved?)
Watch the evaluation progress:
veris evaluation-runs status EVAL_RUN_ID --run-id RUN_ID --watchYou can also trigger evaluations from the Console on the Evaluations page.
Step 11: Generate a Report
Finally, generate a report that aggregates your evaluation results:
veris reports createReports analyze patterns across all simulations in an evaluation run, identifying systemic issues and providing actionable insights. Download the report as HTML:
veris reports get REPORT_ID -o my-report.htmlOr view it directly in the Console under Reports.
The Iteration Loop
Once you’ve completed your first pass, the workflow becomes a tight iteration loop:
- Review the report to identify agent weaknesses
- Fix the issues in your agent’s code
- Push a new image version:
veris env push --tag v1.1 - Re-run simulations:
veris run create - Re-evaluate:
veris evaluation-runs create - Generate a new report and compare
Use veris run local during active development for faster feedback, then run cloud simulations at scale once you’re confident in the changes.
Next Steps
- Explore available services — Salesforce, Calendar, Stripe, Jira, PostgreSQL, and more
- veris.yaml Reference — all configuration options including SSE streaming, email channels, and custom headers
- CLI Command Reference — complete list of commands and flags
- Console Guide — navigate the web dashboard