I spent three to four weeks building an AI voice agent that cold-calls HVAC owners for me. Real outbound calls. Real objection handling. Real appointments booked. When we listen back through the recordings, the owners never clock it as AI. They just think it is a human SDR with a slightly off cadence.
This is the build, and the honest reason it is sitting on the shelf for high-volume use right now.
The stack
Make.com runs the whole thing. Seven scenarios wired around a single Google Sheet called NeverMiss – Outbound Calls that acts as the CRM.
- dialler — every five minutes, pulls one row where
status=new,enabled=TRUE,attempts<3, and fires a call into Vapi - Logger — webhook from Vapi after every call. Writes transcript, recording, and outcome back. Uses an OpenAI module to classify the transcript into structured outcomes
- Watch Dog — every ten minutes, catches rows where the call went out but the Logger webhook never fired. Flips them to
logger-missedfor retry - Reconciler — sweeps stuck rows, enforces attempt caps, handles edge cases
- Recording Sync — pulls MP3s out of Vapi before the 30-day retention wipes them, stores in Drive
- Booking Handler — when the agent closes a booking mid-call, writes the calendar event and fires the confirmation
- CallBack Dialer — separate path for inbound callback requests where a form fill triggers a call within 60 seconds
One Vapi agent on the voice side. Four Twilio numbers rotating for caller ID reputation. ElevenLabs for the actual voice. The complexity is not in any one piece. It is in how tightly they are wired so nothing quietly breaks.
The Sheet is the CRM, with row locks
The Sheet has columns you would expect: business_name, phone_e164, status, call_id, recording_url, email, last_called_at, attempts, and notes.
And some you might not: lead_uid (UUID, stable key), lock_token plus lock_expires_at, next_callable_at, and logger_ran_at.
The lock columns are the interesting part. Before the dialler fires a call, it writes a lock token and an expiry where lock_expires_at = now + 10 minutes. That is the contract. If any other scenario or retry sees a row with a live lock, it skips that row. If the lock has expired, the row is free to reclaim.
Stops the Watch Dog from re-queueing a row that is actively being called. Stops two dialler runs from racing the same lead. Stops the double-call bugs you only catch in production after they have already annoyed a prospect.
This is the kind of thing you do not need until volume hits, and then you really need it.
How it sounds like a human
Most AI voice builds fail on two things. The voice itself, and the gap between the human finishing their sentence and the agent replying. Both telegraph robot immediately.
For the voice. ElevenLabs inside Vapi, cloned from a real sample with permission. Tested OpenAI, Deepgram, and PlayHT alongside. ElevenLabs still wins on natural breath patterns and micro-pauses. The others sound synthesized. This one reads as human.
For the latency. Vapi exposes tunable voice activity detection and endpointing thresholds. The gap between you finishing a sentence and the agent starting its reply has to sit right where a human would land. Too short and it interrupts. Too long and the silence gives it away.
The system prompt runs about 1,100 words. The agent is named Alex. Rules include:
- Never read the prospect back to themselves
- Use natural fillers without overdoing them
- If asked whether this is a robot, be honest
- If asked for pricing, give a range and offer to send email details
- If told to stop calling, confirm and end gracefully
Specific objections each get short inline responses that flip back to a question. Nothing is read aloud. Everything is handled as conversation.
The agent can also call two tools mid-call. One books an appointment, which triggers the Booking Handler scenario to write the calendar event and email the confirmation. The other captures an email, writes it to the Sheet, and kicks off a follow-up sequence. When those tools fire, the agent pauses briefly, waits for the response, then reads the confirmation back to the prospect. No human in the loop on a successful call.
The glue: Logger, Watch Dog, Recording Sync
This is the boring but critical part.
Logger
Every Vapi call ends with a webhook hitting a Make scenario. The payload has the call ID, transcript, recording URL, end reason, and a Vapi-generated summary. The Logger finds the matching Sheet row by lead_uid, then runs the transcript through OpenAI to classify the outcome into the taxonomy used by the rest of the system:
bookedemail_capturedvoicemailno_answerdo_not_callno_interestcallback_requested
It writes everything back to the row and fires downstream actions. Calendar events for bookings. DNC flags for opt-outs. Re-queue for voicemails up to the attempt cap.
Watch Dog
Vapi occasionally fails to fire the post-call webhook. Without a watchdog, those rows sit silently in a dialling state forever while you think everything is fine.
Mine runs every ten minutes. It finds rows where last_called_at is older than five minutes but logger_ran_at is still empty, and flips status to logger-missed so the dialler picks them up again on the next pass. It also clears stale lock tokens so the row can be re-used.
Recording Sync
Vapi deletes recordings after 30 days. That is not long enough. I want them forever to retrain the prompt, spot objection patterns, and audit outcomes. So a Make scenario runs every four hours, pulls the MP3 via HTTP, uploads to a Drive folder named <lead_uid>_<company>_<date>.mp3, writes the Drive URL back to the Sheet, and flips a synced flag.
That permanent archive is what makes the next section possible.
Owners cannot tell it is AI. The actual numbers.
The recording sync is how I proved this instead of just claiming it. After 200 connected calls, I went back and tagged every recording on two questions. Did the prospect ever ask if it was a bot? Did they ever seem confused about who they were talking to?
11 out of 200 asked if it was AI. That is 5.5 percent.
And most of those 11 asked after Alex had already booked a callback or captured their email. Curiosity, not suspicion.
The other 94.5 percent had a normal conversation. Pushed back, negotiated, agreed to a follow-up, or politely declined. Same way they would with any human SDR. Successful bookings averaged three minutes and 40 seconds of call time. That is identical to a human SDR doing the same call, meaning prospects were not shortcutting the conversation because they smelled a robot.
I would share clips but they are real owners' voices without opt-in for public sharing. The data speaks for itself.
Why I have paused high-volume use
Here is the uncomfortable part.
Most HVAC numbers on public directories are not direct owner lines. They are main business lines routing through:
- An IVR menu (press 1 for service, 2 for billing, 3 for new installs)
- A call center or answering service
- A front-desk receptionist whose actual job is keeping people like me away from the owner
Current voice AI is still bad at IVRs. Vapi, Retell, Bland, the whole field. The agent hears press 1 for service and either sits silent until the menu times out, or tries to talk to a recording.
Human gatekeepers are a coin flip. A warm receptionist might put the call through. A suspicious one kills it with one question the agent does not have a clean answer for.
Until voice AI gets genuinely good at navigating phone trees (pressing digits intelligently, detecting robot-vs-human, adjusting strategy mid-call) high-volume outbound to public numbers burns money for roughly 10 to 15 percent owner connection rates.
So I have paused it. Not dead. Paused. The day voice AI handles IVRs cleanly, this whole stack comes right back off the shelf.
Why it still works
If you have direct owner mobile numbers, or any list of cell phones that do not route through IVRs and gatekeepers, this system is a weapon.
Those are cold-outbound economics that actually pencil.
Where direct mobiles come from ethically: LinkedIn scraping through Apollo, Clay, or Wiza. Network-sourced referral lists. Industry trade show attendee lists. Re-engagement campaigns against your own past leads and customers.
The system also works flawlessly for inbound callback. Someone fills a form, Alex calls within 60 seconds, qualifies, and books. Running that in production for specific clients right now. Book rates are silly.
What I am building instead
Inbound. AI receptionists catching the calls HVAC owners are already missing nights, weekends, and during peak summer rush. Same voice agent pattern, flipped direction. Better unit economics, 100 percent connection rate (the prospects are the ones calling you).
That is what NeverMiss is now. If you run an HVAC, plumbing, or roofing business and want to hear what a modern AI receptionist sounds like when the call lands at 9:47 PM on a Saturday, the live demo does exactly that. You enter your business details and the system calls you back within 60 seconds with a receptionist trained on your company.
When voice AI finally cracks IVR navigation, the dialler comes off the shelf the same day.