Supervised Fine-Tuning (SFT)

Set up your agent

Configure your environment and tag LLM calls for trace capture.

Run simulations and evaluate

Run scenarios and evaluate the results.

Train on Veris

Select a base model and evaluation runs, then start training.

Export for external training

Export traces as JSONL for OpenAI, Fireworks, Gemini, or open-source platforms.

Deploy

Get an inference endpoint for your trained model and use it in your agent.

Supported Models

Model	Parameters
Qwen 3.5 9B	9B
Qwen 3.5 35B	35B
DeepSeek-V3.2	685B MoE
DeepSeek-R1	685B MoE
Llama 4 Scout	109B MoE
Llama 3.3 70B	70B
Llama 3.1 8B	8B
GPT-OSS 120B	120B
GPT-OSS 20B	20B
Kimi K2.5	—
Nemotron 3 Super	—

Agent Setup

For Veris to capture clean training data, your agent’s LLM calls need to include the X-Veris-Agent-Id header. This tags your agent’s calls in the trace and filters out internal Veris LLM calls that would contaminate training data.

Add the header to your LLM client:

OpenAI Agents SDK


from openai import AsyncOpenAI
from agents import set_default_openai_client
 
set_default_openai_client(
    AsyncOpenAI(default_headers={"X-Veris-Agent-Id": "my-agent"})
)

Anthropic Python SDK


from anthropic import AsyncAnthropic
 
client = AsyncAnthropic(
    default_headers={"X-Veris-Agent-Id": "my-agent"}
)

LiteLLM


import litellm
 
response = litellm.completion(
    model="gpt-4o",
    messages=[...],
    extra_headers={"X-Veris-Agent-Id": "my-agent"}
)

Google ADK


from google.adk import Agent
from google.genai import Client
 
client = Client(
    http_options={"headers": {"X-Veris-Agent-Id": "my-agent"}}
)
 
agent = Agent(model="gemini-2.0-flash", client=client, ...)

LangGraph / LangChain


from langchain_openai import ChatOpenAI
 
llm = ChatOpenAI(
    model="gpt-4o",
    default_headers={"X-Veris-Agent-Id": "my-agent"}
)

HTTP (any language)


curl https://api.openai.com/v1/chat/completions \
  -H "X-Veris-Agent-Id: my-agent" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{"model": "gpt-4o", "messages": [...]}'

For multi-agent setups, each sub-agent should use a different ID. The export pipeline produces one training example per agent per simulation.

Training on Veris

End-to-end SFT on Veris is coming soon. You can export datasets today and fine-tune on external platforms.

Exporting a Dataset

Export your evaluation run traces as training-ready JSONL. Select evaluation runs as the data source and choose an output format.

Output formats

Format	Target	Tool calls	Thinking tokens
OpenAI	OpenAI fine-tuning API	Yes	No (stripped)
Fireworks	Fireworks fine-tuning	Yes	Yes (`reasoning_content`)
Gemini	Vertex AI tuning	Yes	No (stripped)
Open Source	Unsloth, TRL, any JSONL loader	Yes	Yes (`<think>` tags)

API


curl -X POST https://api.veris.ai/v1/training/datasets \
  -H "Authorization: Bearer $VERIS_API_KEY" \
  -d '{
    "environment_id": "env_xxx",
    "training_type": "sft",
    "evaluation_run_ids": ["evalrun_xxx"],
    "config": {
      "format": "openai",
      "include_thinking": false,
      "include_system_prompt": true,
      "include_tool_definitions": true
    }
  }'

The export runs as a background job. Poll the dataset status until it reaches completed.

Output Structure

The export produces two files in GCS, plus a summary:


training-datasets/{dataset_id}/
  data.jsonl        # Training data — upload directly to your platform
  index.jsonl       # Metadata — for filtering and traceability
  manifest.json     # Summary — counts, format, duration

data.jsonl

Vendor-ready training examples. One JSON object per line, directly uploadable to OpenAI, Fireworks, Gemini, or any JSONL-based training pipeline. No extra fields.

OpenAI format example:


{
  "messages": [
    {"role": "system", "content": "You are a support agent..."},
    {"role": "user", "content": "I need to cancel my card"},
    {"role": "assistant", "tool_calls": [{"type": "function", "function": {"name": "cancel_card", "arguments": "{\"card_id\": \"123\"}"}}]},
    {"role": "tool", "content": "{\"status\": \"cancelled\"}", "tool_call_id": "call_abc"},
    {"role": "assistant", "content": "Your card has been cancelled."}
  ],
  "tools": [{"type": "function", "function": {"name": "cancel_card", ...}}]
}

index.jsonl

Parallel to data.jsonl (same row count, same order). Each line contains metadata for the corresponding training example:


{
  "row": 0,
  "simulation_id": "sim_xxx",
  "agent_id": "my-agent",
  "scenario_id": "scn_xxx",
  "model": "gpt-4o",
  "message_count": 12,
  "tool_count": 5,
  "grading_result": {"score": {"tool_use": 0.95, "accuracy": 0.88}},
  "assertion_result": {"passed": 4, "failed": 0}
}

Use index.jsonl to filter training data. For example, keep only examples where grading_result.score.accuracy >= 0.9, then extract the matching rows from data.jsonl.

manifest.json

Summary of the export: total simulations processed, examples exported, failures, format, and duration.