Supervised Fine-Tuning (SFT)
Set up your agent
Configure your environment and tag LLM calls for trace capture.
Run simulations and evaluate
Run scenarios and evaluate the results.
Train on Veris
Select a base model and evaluation runs, then start training.
Export for external training
Export traces as JSONL for OpenAI, Fireworks, Gemini, or open-source platforms.
Deploy
Get an inference endpoint for your trained model and use it in your agent.
Supported Models
| Model | Parameters |
|---|---|
| Qwen 3.5 9B | 9B |
| Qwen 3.5 35B | 35B |
| DeepSeek-V3.2 | 685B MoE |
| DeepSeek-R1 | 685B MoE |
| Llama 4 Scout | 109B MoE |
| Llama 3.3 70B | 70B |
| Llama 3.1 8B | 8B |
| GPT-OSS 120B | 120B |
| GPT-OSS 20B | 20B |
| Kimi K2.5 | — |
| Nemotron 3 Super | — |
Agent Setup
For Veris to capture clean training data, your agent’s LLM calls need to include the X-Veris-Agent-Id header. This tags your agent’s calls in the trace and filters out internal Veris LLM calls that would contaminate training data.
Add the header to your LLM client:
OpenAI Agents SDK
from openai import AsyncOpenAI
from agents import set_default_openai_client
set_default_openai_client(
AsyncOpenAI(default_headers={"X-Veris-Agent-Id": "my-agent"})
)Anthropic Python SDK
from anthropic import AsyncAnthropic
client = AsyncAnthropic(
default_headers={"X-Veris-Agent-Id": "my-agent"}
)LiteLLM
import litellm
response = litellm.completion(
model="gpt-4o",
messages=[...],
extra_headers={"X-Veris-Agent-Id": "my-agent"}
)Google ADK
from google.adk import Agent
from google.genai import Client
client = Client(
http_options={"headers": {"X-Veris-Agent-Id": "my-agent"}}
)
agent = Agent(model="gemini-2.0-flash", client=client, ...)LangGraph / LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4o",
default_headers={"X-Veris-Agent-Id": "my-agent"}
)HTTP (any language)
curl https://api.openai.com/v1/chat/completions \
-H "X-Veris-Agent-Id: my-agent" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{"model": "gpt-4o", "messages": [...]}'For multi-agent setups, each sub-agent should use a different ID. The export pipeline produces one training example per agent per simulation.
Training on Veris
End-to-end SFT on Veris is coming soon. You can export datasets today and fine-tune on external platforms.
Exporting a Dataset
Export your evaluation run traces as training-ready JSONL. Select evaluation runs as the data source and choose an output format.
Output formats
| Format | Target | Tool calls | Thinking tokens |
|---|---|---|---|
| OpenAI | OpenAI fine-tuning API | Yes | No (stripped) |
| Fireworks | Fireworks fine-tuning | Yes | Yes (reasoning_content) |
| Gemini | Vertex AI tuning | Yes | No (stripped) |
| Open Source | Unsloth, TRL, any JSONL loader | Yes | Yes (<think> tags) |
API
curl -X POST https://api.veris.ai/v1/training/datasets \
-H "Authorization: Bearer $VERIS_API_KEY" \
-d '{
"environment_id": "env_xxx",
"training_type": "sft",
"evaluation_run_ids": ["evalrun_xxx"],
"config": {
"format": "openai",
"include_thinking": false,
"include_system_prompt": true,
"include_tool_definitions": true
}
}'The export runs as a background job. Poll the dataset status until it reaches completed.
Output Structure
The export produces two files in GCS, plus a summary:
training-datasets/{dataset_id}/
data.jsonl # Training data — upload directly to your platform
index.jsonl # Metadata — for filtering and traceability
manifest.json # Summary — counts, format, durationdata.jsonl
Vendor-ready training examples. One JSON object per line, directly uploadable to OpenAI, Fireworks, Gemini, or any JSONL-based training pipeline. No extra fields.
OpenAI format example:
{
"messages": [
{"role": "system", "content": "You are a support agent..."},
{"role": "user", "content": "I need to cancel my card"},
{"role": "assistant", "tool_calls": [{"type": "function", "function": {"name": "cancel_card", "arguments": "{\"card_id\": \"123\"}"}}]},
{"role": "tool", "content": "{\"status\": \"cancelled\"}", "tool_call_id": "call_abc"},
{"role": "assistant", "content": "Your card has been cancelled."}
],
"tools": [{"type": "function", "function": {"name": "cancel_card", ...}}]
}index.jsonl
Parallel to data.jsonl (same row count, same order). Each line contains metadata for the corresponding training example:
{
"row": 0,
"simulation_id": "sim_xxx",
"agent_id": "my-agent",
"scenario_id": "scn_xxx",
"model": "gpt-4o",
"message_count": 12,
"tool_count": 5,
"grading_result": {"score": {"tool_use": 0.95, "accuracy": 0.88}},
"assertion_result": {"passed": 4, "failed": 0}
}Use index.jsonl to filter training data. For example, keep only examples where grading_result.score.accuracy >= 0.9, then extract the matching rows from data.jsonl.
manifest.json
Summary of the export: total simulations processed, examples exported, failures, format, and duration.