Technical Solution: Multi-Agent Voice System (LiveKit + Expertflow)

1. Overview

We built a multi-agent AI voice system that sits on top of LiveKit and integrates with Expertflow's CCM platform. The idea is simple like when a customer calls in, an AI agent picks up instantly, understands what the customer needs, and either handles it directly or routes them to the right specialist agent. If at any point the customer wants to speak to a real person, a live human agent is brought into the call and the AI steps aside gracefully.

The system runs on a completely locally hosted LiveKit setup, meaning all the real-time voice infrastructure of the server, SIP service, and agent workers runs on our own machines with no dependency on any external LiveKit cloud. For the AI brain, it connects to OpenAI's Realtime API 




2. What is LiveKit?

LiveKit is an open-source platform for real-time audio/video communication. Think of it as the backbone that manages voice rooms, participants, and media tracks. In our case it does three things:

  • Receives the inbound phone call via SIP

  • Hosts the "room" where the customer and agent meet

  • Lets us programmatically add/remove participants (including human agents)





3. LiveKit Components We Used

LiveKit Server The core server that manages all rooms and participants. Every call creates one room. The room lives until everyone disconnects. We self-hosted this on our infrastructure.

LiveKit SIP Service A separate service that bridges regular phone calls (SIP protocol) into LiveKit rooms. When a customer dials in via FreeSWITCH, the SIP service converts that call into a LiveKit participant inside a room. It also carries SIP headers like the customer phone number, call ID, and media type into the room as participant attributes.

LiveKit Agents Framework A Python SDK that lets you write AI agents that join rooms as participants. Our AI agents run as workers using this framework. When a new room is created, the framework automatically dispatches a worker to handle it.

LiveKit Dashboard Web UI for monitoring rooms, participants, and active sessions in real time. Useful during demos and debugging — you can see exactly which participants are in which room at any moment.

Redis LiveKit uses Redis internally for pub/sub messaging between its services (server, SIP service, agent workers). It is required for the multi-service setup to work. Without Redis, the SIP service cannot communicate with the agent workers.




4. How a Call Joins the Room

When a customer dials the phone number, the call first hits FreeSWITCH which is our telephony server. FreeSWITCH then forwards this call to the LiveKit SIP Service. The SIP Service is responsible for converting that regular phone call into a LiveKit participant. It creates a new room (using the call ID as the room name) and adds the customer into that room as a SIP participant.

Once the room is created, LiveKit Server notifies all registered agent workers through Redis. Our Python agent worker picks up this notification and joins the same room. At this point the customer and the AI agent are both inside the same room and the conversation begins.




5. Our Multi-Agent Use Case

The problem we solved: A single AI agent cannot be an expert in everything. We needed different AI personalities for Sales, Support, and Billing  but we didn't want separate calls or separate systems for each.

The solution: All agents run inside one single room, one single session. Only one agent is active at a time. Switching between them is just swapping the active personality and the call never drops, the customer never notices.




6. How the Multi-Agent Handoff Works

Each agent is a Python class with its own instructions (system prompt) and its own set of tools. Tools are functions the AI can call when it decides to.

When the AI decides a transfer is needed, it calls a transfer tool. That tool returns a new agent object back to the LiveKit framework. LiveKit sees this return value, deactivates the current agent, and activates the new one. The new agent's on_enter() method fires and it introduces itself to the caller.

No new room. No new SIP call. Same session, different personality.




7. Agent Structure

We built four AI agents and human agent slot:

  • GenericAgent — entry point, greets caller, routes to the right specialist

  • SalesAgent — handles pricing, demos, trials

  • SupportAgent — handles technical issues

  • BillingAgent — handles invoices and payments

  • Human Agent — real person, joins via SIP when requested

Each AI agent uses GPT-4o Realtime as its brain, which means it understands and speaks in real-time with very low latency 



8. Human Agent Transfer

When a human is needed, we use the LiveKit SIP API to dial the human agent number. The human agent joins the existing room as a new SIP participant. Once they are confirmed connected, the AI session is closed and the human takes full control.

Even after the AI leaves, transcription continues using a standalone speech-to-text stream so the full conversation is logged.




9. Conversation Logging (CCM Integration)

Every utterance by the customer, the AI, or the human agent is sent to Expertflow's CCM system in real time via HTTP.

During human transfer there is a brief window where the system queues messages instead of sending them immediately, to ensure they arrive in the correct order once the human is confirmed connected.




10. Shared State Across Agents

Since multiple agents handle the same call, they all need access to the same information. We solve this with a SessionState object that is created once at the start of the call and passed to every agent via userdata. It holds the customer ID, call ID, SIP headers, pending messages, and audio track reference.

No agent ever loses context when a handoff happens because they all read from the same shared object.




11. Summary of Key Design Decisions

Decision

Reason

Single room for all agents

No call drops, seamless experience

GPT-4o Realtime

Ultra-low latency, no TTS pipeline needed

Shared SessionState

All agents share context across handoffs

Redis

Required for LK multi-service communication

Standalone STT after AI closes

Transcription continues even after AI leaves

Message queueing during transfer

Ensures correct message order in CCM