Skip to Content
Use CasesCI/CD regression gating

CI/CD regression gating

Use this page when you’ve got a working development loop and you want to prevent regressions from shipping. If you haven’t pushed your agent to Veris yet, start at Quickstart.

The pattern

On every pull request:

  1. Build your agent’s image.
  2. Push it to Veris with a PR-specific tag.
  3. Run the same scenario set you use in development.
  4. Fail the PR if eval scores drop below a baseline.
veris env push --tag "pr-${PR_NUMBER}" veris run \ --scenario-set-id <SET_ID> \ --grader-id <GRADER_ID> \ --image-tag "pr-${PR_NUMBER}" \ --report

veris run exits non-zero if the evaluation fails, making it a natural CI gate.

Non-interactive flags

In CI you can’t use interactive prompts. Pass all required flags:

  • --scenario-set-id — the scenario set to run against (pin this; don’t regenerate in CI).
  • --grader-id — the grader to evaluate with.
  • --image-tag — the tag you just pushed.
  • --report — generate a report you can link from the PR.
  • --simulation-timeout — per-simulation timeout in seconds.

Full reference: CLI commands.

Authentication

Use an API key, not browser OAuth:

veris login $VERIS_API_KEY

Keep VERIS_API_KEY in your CI’s secret store, not in the repo.

Setting a threshold

Your grader produces scores per simulation. Your CI gate should compare the aggregate score against a baseline — usually the score of main on the same scenario set. Keep the baseline in a file in the repo or fetch it from Veris at run time.

The markdown summary veris run writes to stdout is already structured for pasting into a PR comment. Most teams post it via gh pr comment.

Pin the scenario set. If you regenerate scenarios in CI, you’re comparing apples to oranges across PRs. Regenerate only when your agent’s interface or services change, and do it deliberately.

See also