On-Prem / Customer-Hosted Clusters
Run simulations, scenario generation, and reports on your own Kubernetes cluster (EKS or GKE). Your agent code stays on your infrastructure — Veris manages orchestration and collects results.
How It Works
What runs on your cluster: Simulation jobs, scenario generation, and report generation (your agent + Veris sandbox container).
What stays on Veris: Evaluation/grading, artifact storage (GCS), and the API.
Prerequisites
- A Kubernetes cluster (EKS or GKE)
kubectlaccess to the cluster- A container registry accessible from your cluster (ECR, Artifact Registry, or any private registry)
- A Veris organization with an API key
Infrastructure Requirements
Node Sizing
Veris jobs have different resource requirements. Your cluster must have nodes large enough to schedule them.
| Job Type | CPU Request | Memory Request | Memory Limit | Minimum Node Size |
|---|---|---|---|---|
| Scenario generation | 500m | 4 Gi | 16 Gi | 8 Gi+ RAM (e.g., t3.xlarge) |
| Report generation | 500m | 4 Gi | 16 Gi | 8 Gi+ RAM (e.g., t3.xlarge) |
| Simulation | 1000m | 2 Gi | 4 Gi | 4 Gi+ RAM (e.g., t3.large) |
Scenario generation and report generation are memory-intensive. Nodes with less than 8 Gi of allocatable memory (e.g., t3.medium with ~3.3 Gi allocatable) will fail to schedule these jobs.
Recommended: Use t3.xlarge (16 Gi) or equivalent nodes. This comfortably runs all job types and allows multiple concurrent simulations.
EKS Auto Mode caveat. If you bring your own EKS cluster, prefer a managed node group (or self-managed nodes) over Auto Mode with a self-managed node IAM role. Auto Mode doesn’t reliably attach a custom node role to the EC2 instance profile it provisions, so its NodeClass gets stuck at InstanceProfileReady=False and no nodes ever launch — jobs sit Pending indefinitely. The Terraform module’s create_cluster path uses a managed node group for this reason; on GKE, a standard cluster’s node pool (with the built-in autoscaler) is the equivalent.
Network / Firewall Rules
Inbound: Your K8s API server must be reachable from Veris on its API port (typically 443). The Veris backend connects to your API server to create and manage jobs.
Outbound: Your cluster nodes need HTTPS (port 443) to the following destinations:
| Destination | Purpose | Required |
|---|---|---|
storage.googleapis.com | Download scenario data, upload logs and results | Yes |
api.openai.com | LLM calls (agent + Veris-internal seed/brief generation) | Yes |
api.anthropic.com | LLM calls (agent + Veris-internal seed/brief generation) | Yes |
*.openai.azure.com | Azure LLM fallback (if configured) | If using Azure |
sandbox.api.veris.ai | Event reporting, backend callbacks | Recommended |
| Your container registry (e.g., ECR, Artifact Registry) | Pull simulation images | Yes |
LLM provider access is required even if your agent doesn’t directly call these APIs. Veris sandbox services use LLM calls internally for seed generation, brief creation, and scenario exploration.
Container Registry
Your agent image must be accessible from your cluster’s nodes. Common setups:
ECR (AWS)
If your EKS cluster and ECR repo are in the same AWS account, image pull works automatically via the node IAM role. No additional configuration needed.
For cross-account ECR, configure imagePullSecrets on the veris namespace’s default ServiceAccount.
Cluster Setup
Automated (Terraform)
We provide a Terraform module that wires up everything Veris needs — the namespace, RBAC, token, and container registry — in one terraform apply. Download the main.tf for your provider, set your variables, and apply.
The module works in two modes, controlled by create_cluster:
- Register an existing cluster (
create_cluster = false, the default). Use this when you already run an EKS or GKE cluster. The module only adds the Veris namespace, RBAC, token, and registry — it never touches your nodes. - Create a new cluster (
create_cluster = true). The module also provisions a dedicated network and the compute: on AWS, a standard EKS cluster with a managed node group; on GCP, a standard GKE cluster with an autoscaling node pool. Both usenode_machine_type/node_min_size/node_max_size/node_desired_size(defaults:t3.xlarge/e2-standard-4, 1 / 20 / 2).
On GKE, the cluster autoscaler is built into the control plane, so the node pool scales between node_min_size and node_max_size automatically — nothing extra to deploy. On EKS, the node group runs node_desired_size nodes; deploy Cluster Autoscaler to scale between min and max (the node group is pre-tagged for discovery). See node sizing for what Veris jobs require.
EKS
Download eks-main.tf and save it as main.tf.
Find your organization ID (org_…) and create a Veris API key on your Settings page . The org ID goes in veris_organization_id; the API key authenticates the registration call.
Register an existing cluster:
cat > terraform.tfvars <<EOF
cluster_name = "my-eks-cluster"
veris_organization_id = "org_your_org_id"
region = "us-east-1"
EOF
terraform init && terraform applyThen register the cluster with Veris. The payload carries your veris_organization_id; the request is authenticated with your Veris API key:
export VERIS_API_KEY="vrs_your_api_key"
terraform output -raw veris_cluster_registration | \
curl -X POST https://sandbox.api.veris.ai/v1/clusters \
-H "Authorization: Bearer $VERIS_API_KEY" \
-H "Content-Type: application/json" \
-d @-Or create a new cluster (VPC + standard EKS cluster + managed node group):
cat > terraform.tfvars <<EOF
create_cluster = true
cluster_name = "veris-eks"
veris_organization_id = "org_your_org_id"
region = "us-east-1"
# Node group sizing (defaults shown):
# node_instance_type = "t3.xlarge"
# node_min_size = 1
# node_max_size = 20
# node_desired_size = 2
EOF
terraform init && terraform applyThen register with Veris (authenticated with your Veris API key):
export VERIS_API_KEY="vrs_your_api_key"
terraform output -raw veris_cluster_registration | \
curl -X POST https://sandbox.api.veris.ai/v1/clusters \
-H "Authorization: Bearer $VERIS_API_KEY" \
-H "Content-Type: application/json" \
-d @-Environment Setup
Once your cluster is registered, set up an environment and push your agent image.
Create an environment
veris env create --name my-agentNote the environment ID (e.g., env_abc123) from the output — you’ll need it for the next step.
Create a container repository for your environment
Each environment gets its own repository path under your image registry.
ECR
ECR requires repositories to be created explicitly:
aws ecr create-repository \
--repository-name YOUR_REPO/ENV_ID \
--region YOUR_REGIONFor example, with the module’s default registry name veris-simulation-images and environment ID env_abc123:
aws ecr create-repository \
--repository-name veris-simulation-images/env_abc123 \
--region us-east-1Authenticate to your registry
veris env push builds locally and pushes to your registry, so Docker must be logged in to it first. The Veris base image is pulled with Veris-provided credentials automatically, but the final push uses your own registry credentials. Without them the push fails with:
push access denied ... authorization failed: no basic auth credentialsECR
ECR login tokens are valid for 12 hours, so re-run this whenever it expires:
aws ecr get-login-password --region YOUR_REGION | \
docker login --username AWS --password-stdin YOUR_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.comECR does not auto-create repositories. If you skipped the step above, the push also fails until the YOUR_REPO/ENV_ID repository exists.
Push your agent image
The Veris CLI detects that your organization has an external registry and automatically builds locally:
veris env push --tag latestThis will:
- Pull the Veris base image using credentials from the API
- Build your agent image locally using
.veris/Dockerfile.sandbox - Push to your external registry at
YOUR_REGISTRY/ENV_ID:latest
Managing Your Cluster
Update credentials
When your token expires or you rotate credentials:
curl -X PUT https://sandbox.api.veris.ai/v1/clusters/CLUSTER_ID \
-H "Authorization: Bearer YOUR_VERIS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"credentials": {"token": "NEW_TOKEN"}}'The cluster status resets to pending after credential updates. Run the connectivity test again to verify.
Remove a cluster
curl -X DELETE https://sandbox.api.veris.ai/v1/clusters/CLUSTER_ID \
-H "Authorization: Bearer YOUR_VERIS_API_KEY"How Jobs Run on Your Cluster
When you trigger a simulation, generation, or report, the Veris backend:
- Creates a per-run K8s Secret on your cluster containing short-lived GCS credentials and Veris-internal LLM keys. These are scoped per-run and automatically deleted after completion.
- Creates a K8s Job using your registered image. The job manifest is patched to remove Veris-specific scheduling (gVisor, node selectors, tolerations) so it runs on any available node.
- An init container downloads scenario data and configuration from Veris GCS using the injected credentials.
- Your agent runs inside the simulation container alongside Veris sandbox services (mock services, LLM proxy, engine).
- A sidecar container periodically uploads logs, results, and generated artifacts back to Veris GCS.
- On completion, the per-run Secret is automatically deleted from your cluster.
Limitations
- Evaluation and grading run on Veris-managed infrastructure (not your cluster).
- One cluster per organization. Each Veris organization maps to one registered cluster.
- Supported providers. Currently EKS and GKE. Other K8s distributions may work with bearer token auth but are not officially supported.
- GCS-based artifacts. All artifacts (scenarios, logs, results) are stored in Veris-managed GCS buckets.
- GCS token lifetime. The short-lived GCS credential is valid for 1 hour. Jobs that exceed this window may fail to upload final results.