Skip to Content

Supervised Fine-Tuning (SFT)

Set up your agent

Configure your environment and tag LLM calls for trace capture.

Run simulations and evaluate

Run scenarios and evaluate the results.

Train on Veris

Select a base model and evaluation runs, then start training.

Export for external training

Export traces as JSONL for OpenAI, Fireworks, Gemini, or open-source platforms.

Deploy

Get an inference endpoint for your trained model and use it in your agent.

Supported Models

ModelParameters
Qwen 3.5 9B9B
Qwen 3.5 35B35B
DeepSeek-V3.2685B MoE
DeepSeek-R1685B MoE
Llama 4 Scout109B MoE
Llama 3.3 70B70B
Llama 3.1 8B8B
GPT-OSS 120B120B
GPT-OSS 20B20B
Kimi K2.5
Nemotron 3 Super

Agent Setup

For Veris to capture clean training data, your agent’s LLM calls need to include the X-Veris-Agent-Id header. This tags your agent’s calls in the trace and filters out internal Veris LLM calls that would contaminate training data.

Add the header to your LLM client:

OpenAI Agents SDK

from openai import AsyncOpenAI from agents import set_default_openai_client set_default_openai_client( AsyncOpenAI(default_headers={"X-Veris-Agent-Id": "my-agent"}) )

Anthropic Python SDK

from anthropic import AsyncAnthropic client = AsyncAnthropic( default_headers={"X-Veris-Agent-Id": "my-agent"} )

LiteLLM

import litellm response = litellm.completion( model="gpt-4o", messages=[...], extra_headers={"X-Veris-Agent-Id": "my-agent"} )

Google ADK

from google.adk import Agent from google.genai import Client client = Client( http_options={"headers": {"X-Veris-Agent-Id": "my-agent"}} ) agent = Agent(model="gemini-2.0-flash", client=client, ...)

LangGraph / LangChain

from langchain_openai import ChatOpenAI llm = ChatOpenAI( model="gpt-4o", default_headers={"X-Veris-Agent-Id": "my-agent"} )

HTTP (any language)

curl https://api.openai.com/v1/chat/completions \ -H "X-Veris-Agent-Id: my-agent" \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -d '{"model": "gpt-4o", "messages": [...]}'

For multi-agent setups, each sub-agent should use a different ID. The export pipeline produces one training example per agent per simulation.

Training on Veris

End-to-end SFT on Veris is coming soon. You can export datasets today and fine-tune on external platforms.

Exporting a Dataset

Export your evaluation run traces as training-ready JSONL. Select evaluation runs as the data source and choose an output format.

Output formats

FormatTargetTool callsThinking tokens
OpenAIOpenAI fine-tuning APIYesNo (stripped)
FireworksFireworks fine-tuningYesYes (reasoning_content)
GeminiVertex AI tuningYesNo (stripped)
Open SourceUnsloth, TRL, any JSONL loaderYesYes (<think> tags)

API

curl -X POST https://api.veris.ai/v1/training/datasets \ -H "Authorization: Bearer $VERIS_API_KEY" \ -d '{ "environment_id": "env_xxx", "training_type": "sft", "evaluation_run_ids": ["evalrun_xxx"], "config": { "format": "openai", "include_thinking": false, "include_system_prompt": true, "include_tool_definitions": true } }'

The export runs as a background job. Poll the dataset status until it reaches completed.

Output Structure

The export produces two files in GCS, plus a summary:

training-datasets/{dataset_id}/ data.jsonl # Training data — upload directly to your platform index.jsonl # Metadata — for filtering and traceability manifest.json # Summary — counts, format, duration

data.jsonl

Vendor-ready training examples. One JSON object per line, directly uploadable to OpenAI, Fireworks, Gemini, or any JSONL-based training pipeline. No extra fields.

OpenAI format example:

{ "messages": [ {"role": "system", "content": "You are a support agent..."}, {"role": "user", "content": "I need to cancel my card"}, {"role": "assistant", "tool_calls": [{"type": "function", "function": {"name": "cancel_card", "arguments": "{\"card_id\": \"123\"}"}}]}, {"role": "tool", "content": "{\"status\": \"cancelled\"}", "tool_call_id": "call_abc"}, {"role": "assistant", "content": "Your card has been cancelled."} ], "tools": [{"type": "function", "function": {"name": "cancel_card", ...}}] }

index.jsonl

Parallel to data.jsonl (same row count, same order). Each line contains metadata for the corresponding training example:

{ "row": 0, "simulation_id": "sim_xxx", "agent_id": "my-agent", "scenario_id": "scn_xxx", "model": "gpt-4o", "message_count": 12, "tool_count": 5, "grading_result": {"score": {"tool_use": 0.95, "accuracy": 0.88}}, "assertion_result": {"passed": 4, "failed": 0} }

Use index.jsonl to filter training data. For example, keep only examples where grading_result.score.accuracy >= 0.9, then extract the matching rows from data.jsonl.

manifest.json

Summary of the export: total simulations processed, examples exported, failures, format, and duration.