Training: Supervised Fine-Tuning

Use this page when you want to fine-tune a base model on your agent’s high-scoring simulation transcripts. This is a secondary workflow — most users should start with the development loop first.

See the SFT training reference for the full setup: selecting transcripts, formatting the training dataset, launching the job.

If your team wants to use SFT but doesn’t have in-house fine-tuning expertise, reach out about enterprise support — we can partner on dataset selection, training setup, and evaluating the fine-tuned model.

When SFT makes sense

You already have a meaningful number of simulations scoring well against a trustworthy grader.
You want to distill a larger model’s behavior into a smaller, cheaper one.
You need lower latency or cost per turn in production.

When SFT doesn’t make sense

Your agent still has fixable prompt or tooling issues. Fix those first — you don’t want to train a smaller model on bad behavior.
Your scenario set doesn’t cover the distribution of real production traffic. SFT amplifies whatever is in the training data, including gaps.

Training: Supervised Fine-Tuning

When SFT makes sense

When SFT doesn’t make sense

See also