Skip to Content

LiveKit Agents

LiveKit Agents is a self-hosted voice runtime: the orchestration loop (endpointing, turn-taking, the LLM call, TTS) runs inside an AgentSession in your own process, and the media plane is WebRTC end-to-end through a LiveKit server (the SFU). You don’t open a WebSocket to a vendor — you run a LiveKit server, register a worker, and the worker is auto-dispatched into rooms over WebRTC. Everything lives in your pod.

This is a different integration shape from the trace-native frameworks on the overview page. There’s no Python entry point for Veris to call into, and the actor can’t speak WebRTC directly; instead the Veris actor reaches your agent over the voice_ws channel, a small bridge translates voice_ws ↔ WebRTC, and tool visibility comes from events you report (see Tool calls and grading), not native trace ingestion.

Architecture

Three processes run in the sandbox pod, launched by start.sh: a self-hosted livekit-server (the SFU, dev mode, on :7880), the Agents worker (app.agent), and a voice_ws bridge (app.web) that listens on the actor’s port. The bridge is what the actor talks to — voice_ws can’t reach LiveKit directly because LiveKit is WebRTC end-to-end.

A room is created when the actor opens WS /voice: the bridge joins the SFU as participant veris-actor and publishes the actor’s incoming PCM16 as a LiveKit mic track. The SFU then auto-dispatches the registered worker into that room, and entrypoint(ctx) starts an AgentSession joined to ctx.room. The actor’s audio is one participant, the agent’s audio is the other, and the bridge relays bytes both ways. None of this needs a veris-sandbox change — the actor drives the standard voice_ws channel.

What runs in your pod: the SFU, the worker (with its AgentSession, LLM, TTS, VAD, turn-taking, and your @function_tool implementations), and the bridge. There is no vendor media server — livekit-server is baked into the image and run in dev mode (livekit-server --dev --bind 0.0.0.0). The realtime model is reached over outbound HTTPS to OpenAI (openai.realtime.RealtimeModel); the only declared sandbox service is postgres.

Channel: voice_ws

Declare the actor channel as voice_ws pointing at the bridge’s WebSocket. The actor speaks raw PCM16 bytes at 24 kHz mono — the default binary framing (voice_ws also offers a json envelope); the bridge re-slices LiveKit’s ~10 ms output frames up to 20 ms frames for the actor.

.veris/veris.yaml
version: "1.0" mini-bcs-voice-livekit-env: services: - name: postgres config: SCHEMA_PATH: /agent/db/schema.sql actor: channels: - type: voice_ws url: ws://localhost:8008/voice agent: name: Mini BCS Voice (LiveKit) code_path: /agent # start.sh launches the three in-container processes: livekit-server (the # SFU), the Agents worker, and the voice_ws bridge that the actor talks to # on :8008. No veris-sandbox changes — the actor drives standard voice_ws. entry_point: bash /agent/start.sh environment: DATABASE_URL: postgresql://postgres:postgres@localhost:5432/veris LIVEKIT_URL: ws://localhost:7880 LIVEKIT_API_KEY: devkey LIVEKIT_API_SECRET: secret # OPENAI_API_KEY drives the LiveKit OpenAI Realtime plugin. Inject once: # veris env vars set OPENAI_API_KEY=sk-... --secret

Unlike the trace-native frameworks, the entry_point is bash /agent/start.sh — a supervisor for the three processes, not a single server command. See the voice_ws reference for the full field list, protocol options (binary vs json), and audio contract.

The worker / AgentSession

LiveKit configuration is not hosted state — it’s a library running in your pod. The worker is an AgentServer started in prod mode (python -m app.agent start), and each dispatched room runs an rtc_session entrypoint that constructs an AgentSession against ctx.room.

app/agent.py
from livekit.agents import Agent, AgentServer, AgentSession, JobContext, RunContext, cli from livekit.agents.llm import ToolError, function_tool from livekit.plugins import openai # load_fnc always reports 0 so this dedicated, one-call-per-pod worker never # self-throttles. By default a `start` (prod-mode) worker uses a CPU-based load # function with a 0.7 threshold; when the SFU + worker + bridge + Realtime # session share one CPU-bound sandbox pod under concurrent cluster load, that # threshold trips and the SFU reports "no workers with sufficient capacity", # so the agent never joins the room and the actor sees callee_no_answer. # (dev mode defaults the threshold to inf for exactly this reason.) server = AgentServer(load_fnc=lambda *_: 0.0) @server.rtc_session() async def entrypoint(ctx: JobContext) -> None: ctx.log_context_fields = {"room": ctx.room.name} logger.info("[entrypoint] joining room=%s", ctx.room.name) session = AgentSession( llm=openai.realtime.RealtimeModel( voice=os.environ.get("REALTIME_VOICE", "alloy"), model=os.environ.get("REALTIME_MODEL", "gpt-realtime"), ), ) await session.start(agent=RileyAgent(), room=ctx.room) logger.info("[entrypoint] session started room=%s", ctx.room.name) if __name__ == "__main__": cli.run_app(server)

The agent speaks first: RileyAgent.on_enter() calls session.generate_reply(...) to greet the caller as soon as the session joins the room.

app/agent.py
def __init__(self) -> None: super().__init__(instructions=AGENT_PROMPT) self._api = BCSAPI() async def on_enter(self) -> None: await self.session.generate_reply( instructions="Greet the caller as Riley from Acme Bank's credit card team and ask how you can help, in one short sentence." )

The registration gate

Startup ordering is load-bearing. The actor opening /voice is what creates a room, and the SFU can only dispatch the worker into that room if the worker has already registered. So start.sh starts the SFU, waits for :7880, starts the worker, then gates the bridge on the worker actually registering before accepting any caller.

The gate polls the worker’s log for the registered worker line — it is not a fixed sleep. The worker registers with the SFU asynchronously, and under concurrent cluster load that can take 10s+. A sleep 5 races: if the bridge accepts a caller before the worker registers, the room is created before the worker can be auto-dispatched into it, the agent never joins, and the actor reports callee_no_answer.

start.sh
# 2. Agents worker — registers with the SFU so it can be dispatched. Capture # its log so we can gate on *actual* registration, and mirror it to stdout so # it still lands in agent.log for debugging. uv run --no-sync python -m app.agent start > /tmp/worker.log 2>&1 & WK_PID=$! tail -f /tmp/worker.log 2>/dev/null & TAIL_PID=$! # Gate the bridge on the worker actually registering with the SFU — NOT a fixed # sleep. Under concurrent cluster load the worker can take 10s+ to register; if # the bridge accepts a caller before then, the room is created before the worker # can be auto-dispatched into it, so the agent never joins and the actor reports # callee_no_answer. Wait (up to 60s) for the "registered worker" log line. echo "[start] waiting for the agent worker to register with the SFU..." for _ in $(seq 1 120); do grep -q "registered worker" /tmp/worker.log 2>/dev/null && { echo "[start] worker registered — bringing up voice_ws bridge"; break; } kill -0 "$WK_PID" 2>/dev/null || { echo "[start] worker exited before registering" >&2; break; } sleep 0.5 done # 3. voice_ws bridge / web server. uv run --no-sync uvicorn app.web:app --host 0.0.0.0 --port "$PORT" & WEB_PID=$!

Both LiveKit dispatch failures surface identically as callee_no_answer and only bite under concurrent cluster load — a single local smoke test passes and the problem appears at scale. If a handful of parallel sims pass but a larger batch shows ~half failing to connect, suspect the registration gate and the load_fnc throttle (below) before anything in the bridge or audio path. Both fixes live in the agent’s own code; neither needs a veris-sandbox change.

The script supervises all three processes with wait -n + a cleanup/trap, and deliberately runs without set -e (it would kill the shell on the first child’s non-zero exit before cleanup runs). See Pattern 9 — transport bridge  for the canonical pattern.

Tools

Tools are @function_tool()-decorated async methods on the RileyAgent(Agent) class, holding shared state via self._api = BCSAPI() (constructed in __init__). They run in-process on the worker — there’s no schema-on-record / impl-in-process split, and no inbound HTTP.

app/agent.py
class RileyAgent(Agent): # A read-only lookup: one success path, reported once. @function_tool() async def display_card_info_by_last4( self, context: RunContext, last4: str ) -> dict: """Find a card by the last 4 digits and return its details. Returns {} if not found.""" card = self._api.find_card_by_last4(last4) result = card.model_dump() if card else {} report_tool_call("display_card_info_by_last4", {"last4": last4}, result) return result # A mutating tool: validate, report on both paths, raise ToolError on failure. @function_tool() async def change_card_status( self, context: RunContext, card_id: str, new_status: str ) -> dict: """Update a card's status. A cancelled card cannot change status.""" args = {"card_id": card_id, "new_status": new_status} try: card = self._api.update_card_status(card_id, CardStatus(new_status)) result = card.model_dump() if card else {} report_tool_call("change_card_status", args, result) return result except ValueError as exc: report_tool_call("change_card_status", args, {"error": str(exc)}) raise ToolError(str(exc))

Read-only lookups report their single success path. Mutating tools — change_card_status, request_card_replacement — wrap the BCSAPI call in try/except ValueError, report an {"error": ...} result, and re-raise as ToolError so the model gets a clean failure (see Tool calls and grading).

Tool calls and grading

@function_tool tools execute in-process on the worker and never appear in the actor’s audio transcript, so the grader can’t see them — real, completed actions get flagged as fabricated. Report each call to the engine so it lands in the graded trace:

app/agent.py
_ENGINE_URL = os.environ.get("ENGINE_URL", "http://localhost:6100") _SIMULATION_ID = os.environ.get("SIMULATION_ID") def _emit_tool_event(name: str, args: dict, result: object) -> None: body = json.dumps( { "service": "agent", "event_type": "agent_tool_call", "data": {"name": name, "arguments": args, "result": result}, }, default=str, # enums/datetimes — same handling as the tool result ) try: httpx.post( f"{_ENGINE_URL}/simulations/{_SIMULATION_ID}/events", content=body, headers={"Content-Type": "application/json"}, timeout=2.0, ) except Exception as exc: logger.warning("[tool] could not report %s to engine: %s", name, exc) def report_tool_call(name: str, args: dict, result: object) -> None: """Fire-and-forget report of a tool call to the Veris engine. No-op outside a simulation. Runs the blocking POST in a worker thread so it never stalls the realtime voice loop, even if the engine is slow/unreachable. """ if not _SIMULATION_ID: return task = asyncio.create_task(asyncio.to_thread(_emit_tool_event, name, args, result)) _report_tasks.add(task) task.add_done_callback(_report_tasks.discard)

SIMULATION_ID and ENGINE_URL are injected by the sandbox; an unset SIMULATION_ID is the signal to no-op. The event_type must be exactly agent_tool_call (the renderer keys on it), and the POST runs in a worker thread via asyncio.to_thread so the blocking call never stalls the realtime voice loop. This is the canonical voice-agent tool-reporting pattern — see Tool call reporting  in the voice_ws reference.

The audio bridge

The bridge (app.web) owns the voice_ws audio contract. It publishes the actor’s incoming PCM16 as a LiveKit mic track, subscribes to the agent’s audio track, and relays it back — re-slicing LiveKit’s ~10 ms (480-byte) AudioStream frames up to the 20 ms (960-byte) frames the actor expects. The wire contract is PCM16, 24,000 Hz, mono, 20 ms frames (FRAME_SAMPLES = 480, FRAME_BYTES = 960).

Unlike a speak-only vendor stream, the agent’s LiveKit track is continuous WebRTC media — it already carries the silence between utterances, so the actor’s server-side VAD commits each turn on its own. There is no per-turn trailing-silence pump to maintain. The one explicit silence is a 200 ms END_OF_CALL_SILENCE burst flushed at hangup, so the actor’s VAD commits the final turn before the socket closes.

Runtime env vars

veris env vars set OPENAI_API_KEY=sk-... --secret

OPENAI_API_KEY drives the LiveKit OpenAI Realtime plugin and is the only secret you must inject. LIVEKIT_URL, LIVEKIT_API_KEY (devkey), and LIVEKIT_API_SECRET (secret) are dev-mode values set in veris.yaml (and defaulted in start.sh); the LiveKit AgentServer/cli machinery reads them from the environment. REALTIME_VOICE (default alloy) and REALTIME_MODEL (default gpt-realtime) are optional overrides; ENGINE_URL and SIMULATION_ID are set by the sandbox in-sim.

Dockerfile

The base-image layer is thicker than a trace-native agent’s: in addition to the Python deps it bakes in the livekit-server binary (pinned for reproducible, CI-stable builds — the get.livekit.io installer floats “latest” via the unauthenticated GitHub API and rate-limits on CI). See the Dockerfile.sandbox reference.

.veris/Dockerfile.sandbox
ARG VERIS_BASE FROM ${VERIS_BASE} # livekit-server — the in-container SFU the agent worker and the /voice bridge # both connect to on localhost:7880. Pinned to a fixed release for reproducible, # CI-stable builds. Dropped in /usr/local/bin (on PATH; no sudo needed as root). ARG LIVEKIT_SERVER_VERSION=1.12.0 USER root RUN apt-get update \ && apt-get install -y --no-install-recommends curl ca-certificates tar \ && curl -fsSL "https://github.com/livekit/livekit/releases/download/v${LIVEKIT_SERVER_VERSION}/livekit_${LIVEKIT_SERVER_VERSION}_linux_$(dpkg --print-architecture).tar.gz" \ | tar -xz -C /usr/local/bin livekit-server \ && rm -rf /var/lib/apt/lists/* # Python deps first for layer caching. COPY pyproject.toml uv.lock /agent/ WORKDIR /agent RUN uv sync --frozen --no-dev # Agent code + runtime assets. COPY app /agent/app COPY agent_desc.txt /agent/agent_desc.txt COPY db/schema.sql /agent/db/schema.sql COPY ui /agent/ui COPY start.sh /agent/start.sh WORKDIR /agent

Sharp edges

GotchaWhy it bites
Gate the bridge on the worker’s registered worker log line, not a sleepAuto-dispatch only fires for rooms created after the worker registers; under load registration takes 10s+. Accept a caller too early and the room exists but the agent never joins → callee_no_answer.
Pin load_fnc=lambda *_: 0.0 on the prod-mode AgentServerA start worker ships a CPU load function with a 0.7 threshold and refuses dispatch when busy; with the SFU + worker + bridge + realtime session sharing one pod it trips under load and the SFU reports “no workers with sufficient capacity”. (dev mode defaults the threshold to inf.)
start.sh runs without set -eset -e would kill the supervisor on the first child’s non-zero exit before the cleanup/trap runs. The script uses wait -n + trap to bring the container down as a unit instead.
Both dispatch failures only bite under concurrent loadA single local smoke test passes; ~half of a larger parallel batch can fail to connect. Suspect the registration gate and load_fnc before the bridge or audio path.
livekit-server version must be pinned in the DockerfileThe get.livekit.io installer resolves “latest” via the unauthenticated GitHub API, which floats between builds and rate-limits on CI.

What’s next