Training: Supervised Fine-Tuning
Use this page when you want to fine-tune a base model on your agent’s high-scoring simulation transcripts. This is a secondary workflow — most users should start with the development loop first.
See the SFT training reference for the full setup: selecting transcripts, formatting the training dataset, launching the job.
If your team wants to use SFT but doesn’t have in-house fine-tuning expertise, reach out about enterprise support — we can partner on dataset selection, training setup, and evaluating the fine-tuned model.
When SFT makes sense
- You already have a meaningful number of simulations scoring well against a trustworthy grader.
- You want to distill a larger model’s behavior into a smaller, cheaper one.
- You need lower latency or cost per turn in production.
When SFT doesn’t make sense
- Your agent still has fixable prompt or tooling issues. Fix those first — you don’t want to train a smaller model on bad behavior.
- Your scenario set doesn’t cover the distribution of real production traffic. SFT amplifies whatever is in the training data, including gaps.
See also
- Training reference — full SFT workflow
- Training — Reinforcement Learning
- Development loop — start here first