What is the hardest part of building an AI receptionist now?

The AI itself is straightforward. The hard part is the orchestration around it: telephony, latency budget, tool reliability, state management, escalation logic, monitoring, and model drift. That layer takes six to eight months to figure out and breaks every time it goes near production.

What is the latency budget for an AI receptionist call?

Around one thousand milliseconds end to end before the conversation starts sounding robotic. That budget gets split across speech-to-text, large language model inference, tool calls, text-to-speech, and the telephony round trip. Miss the budget and customers hang up.

How long does it take to build the orchestration layer for an AI receptionist?

Six to eight months of focused work to get past the demo and into production. Most homemade builds work in test calls, then break the moment real volume hits: webhook signing failures, CRM five hundred errors during peak hours, false-success bookings where the agent confirms before the database writes, and sequential rather than streaming pipelines that push latency past two seconds.

What questions should I ask AI receptionist providers before buying?

Five questions reveal whether a provider shipped a real production system or just a demo. Average end-to-end response latency under load. How they handle webhook timeouts on CRM bookings. What happens if a call drops mid-conversation. How they detect false success when the agent says booked but the booking did not actually write. Their transfer rate to humans and what triggers it.

Why do most homemade AI receptionist builds fail in production?

Because the demo and the production system are different products. The demo handles the happy path. Production has to handle webhook signing mismatches, CRM API failures during peak load, agent confirmations that race ahead of database writes, latency above two seconds because pipelines run sequentially instead of streaming, and silent model drift that drops bookings fifteen percent before anyone notices.

The part of building an AI receptionist nobody talks about

Most teams trying to build their own AI receptionist think the hard part is the AI.

It's not. The AI is the easy part now.

The hard part is everything around the AI. The part that doesn't show up in demos or tutorials. The part that takes six to eight months to figure out and breaks every time it goes near production.

I've watched a few teams try to build this themselves. They all hit the same wall.

1000ms

Total latency budget per response

6-8 mo

Orchestration buildout before production

8 layers

Hidden under the 30-second demo

What they think they're building

You watch a Vapi or Retell demo. Agent answers a call, takes a booking, sends a confirmation. Looks simple.

So they think the build is:

Pick an LLM
Write some prompts
Pick a voice
Connect a phone number
Ship it

A weekend project.

What they're actually building

Here's what's underneath that 30-second demo.

Telephony layer. SIP trunking. Carrier integration. STIR/SHAKEN attestation so calls don't get marked as spam. Inbound number provisioning. Outbound caller ID verification. DTMF detection. Call recording compliance per state.

Audio infrastructure. Voice activity detection that doesn't false-trigger on background noise. Barge-in handling so the agent stops talking when the caller interrupts. Echo cancellation. Silence detection. Dropped audio recovery.

Latency budget. The whole call has a 1000ms response window before it sounds robotic. That 1000ms gets split across speech-to-text, LLM inference, tool calls, text-to-speech, telephony round trip. Each one has to be optimized. Miss the budget and customers hang up.

Tool reliability. The agent calls your CRM to book an appointment. The API times out at 8 seconds. Agent already said "perfect, you're booked for Thursday." Customer gets no confirmation. Shows up. No record. Trust gone.

State management. Call drops mid-conversation. Customer calls back. How does the agent know they were already 80% through booking? Handoff between inbound and outbound. Retry logic. Idempotency so the same booking doesn't get created twice.

Escalation logic. When does the agent transfer to a human. When does it just take a message. How does it handle threats, lawsuits, contract disputes, refund demands. These aren't AI problems. They're product problems with hard rules.

Monitoring. How do you know the agent is failing? You can't watch every call. You need three layers — system health (uptime, error rates), leading indicators (transfer rate, low-confidence responses), business outcomes (bookings, conversion, revenue).

Model and data drift. The LLM provider updates their model. Agent behavior shifts subtly. Nobody notices for two weeks. You find out when bookings drop 15%.

The build vs buy moment

This is the conversation I have with operators who think they want to build it themselves.

They're not wrong about the AI. Anyone can prompt an LLM to sound friendly on the phone.

They're wrong about the rest.

I talked to a guy who'd been building his own setup for 8 months. He had the agent working great in test calls. The moment he tried to ship it into production, everything broke.

His telephony provider's webhook signing wasn't matching. His CRM API was throwing 500s on bookings during peak hours. His agent was confirming bookings before the API actually wrote them, so customers got told they had appointments that didn't exist. His latency was 2.4 seconds because he was running STT → LLM → TTS sequentially instead of streaming.

He asked me how long it took us to solve those problems.

About a year of running it in production with real shops.

He stopped trying to build his own.

Why this matters if you're shopping

If you're an operator looking at AI receptionist providers, the question isn't "do you have an AI that sounds good." Every provider sounds good in the demo.

The question is "what happens when something goes wrong."

Ask them:

What's your average end-to-end response latency under load
How do you handle webhook timeouts on CRM bookings
What happens if a call drops mid-conversation
Show me how you detect false success — when the agent says "booked" but the booking didn't actually happen
What's your transfer rate to humans and what triggers it

Most cheap providers can't answer these. They shipped the demo. They didn't ship the production system.

The difference between a $300/month AI receptionist and one that actually works is everything underneath the conversation.

The takeaway

Building AI is no longer the hard part. Infrastructure around the AI is.

If you're an operator, ask the harder questions before you buy. The conversation quality is table stakes. The orchestration is what determines whether the agent actually books the job.

If you're a builder thinking about competing in this space, plan for six to eight months on the orchestration before you ship. Or pick a different problem. This one is solved by people who have already taken the lumps.

If you want to see what running the orchestration looks like from the operator side, my last long-form was on how I replaced hours of manual work with a self-hosted AI agent — same NeverMiss, different stack, full build log including the security layer most tutorials skip.

The part of building an AI receptionist nobody talks about.