CI/CD regression gating
Use this page when you’ve got a working development loop and you want to prevent regressions from shipping. If you haven’t pushed your agent to Veris yet, start at Quickstart.
The pattern
On every pull request:
- Build your agent’s image.
- Push it to Veris with a PR-specific tag.
- Run the same scenario set you use in development.
- Fail the PR if eval scores drop below a baseline.
veris env push --tag "pr-${PR_NUMBER}"
veris run \
--scenario-set-id <SET_ID> \
--grader-id <GRADER_ID> \
--image-tag "pr-${PR_NUMBER}" \
--reportveris run exits non-zero if the evaluation fails, making it a natural CI gate.
Non-interactive flags
In CI you can’t use interactive prompts. Pass all required flags:
--scenario-set-id— the scenario set to run against (pin this; don’t regenerate in CI).--grader-id— the grader to evaluate with.--image-tag— the tag you just pushed.--report— generate a report you can link from the PR.--simulation-timeout— per-simulation timeout in seconds.
Full reference: CLI commands.
Authentication
Use an API key, not browser OAuth:
veris login $VERIS_API_KEYKeep VERIS_API_KEY in your CI’s secret store, not in the repo.
Setting a threshold
Your grader produces scores per simulation. Your CI gate should compare the aggregate score against a baseline — usually the score of main on the same scenario set. Keep the baseline in a file in the repo or fetch it from Veris at run time.
The markdown summary veris run writes to stdout is already structured for pasting into a PR comment. Most teams post it via gh pr comment.
Pin the scenario set. If you regenerate scenarios in CI, you’re comparing apples to oranges across PRs. Regenerate only when your agent’s interface or services change, and do it deliberately.
See also
- CI/CD configuration — example configs for GitHub Actions, GitLab CI, and others
- CLI commands
- Development loop