How long did it take to build the AI cold-caller system?

Three to four weeks of iterative build and tuning. The Make.com workspace runs seven scenarios wired around a single Google Sheet, with a Vapi voice agent on the call side and Twilio handling the phone numbers. Most of the time went into tuning the voice agent's prompt, handling objections naturally, and building out the watchdog scenarios that catch failures.

Can business owners tell they are talking to AI?

Across 200 connected outbound calls, 11 owners asked if they were speaking with AI. That is 5.5 percent. Most of those 11 asked after the agent had already booked a callback or captured an email, suggesting curiosity rather than suspicion. The other 94.5 percent held normal conversations, pushed back, negotiated, agreed to follow-ups, or politely declined, the same way they would with a human SDR.

Why was the high-volume cold-calling use paused?

Most HVAC phone numbers on public directories route through IVR menus, call centers, or receptionists whose job is keeping callers away from the owner. Current voice AI still struggles with IVRs, so connection rates to actual owners drop to around 10 to 15 percent when calling main business lines. High-volume cold calling to public numbers burns money until voice AI gets better at navigating phone trees.

When does the AI cold-caller still work well?

When you have direct owner mobile numbers instead of main business lines. On direct mobiles, connection rates run 40 to 55 percent and book rates on connected calls run 8 to 14 percent, with an all-in cost of about 18 cents per connected minute. It also works flawlessly for inbound callback scenarios where a form fill triggers a call back within 60 seconds.

What tools does the build use?

Make.com orchestrates the whole system across seven scenarios. Vapi hosts the voice agent. Twilio provides the phone numbers. ElevenLabs generates the actual voice. OpenAI classifies call transcripts into structured outcomes inside the Logger scenario. Google Sheets acts as the CRM with row-level lock tokens for concurrency, and Google Drive stores the recording archive.

How does the system prevent two scenarios from calling the same lead at once?

Each row in the Sheet has a lock_token column and a lock_expires_at column. Before the dialler fires a call, it writes a lock token and an expiry set to now plus ten minutes. Any other scenario or retry that sees a row with a live lock skips it. If the lock has expired, the row is free to reclaim. This row-level distributed lock prevents double-calls and race conditions when volume ramps up.

How I Built an AI Cold-Caller for HVAC Outbound (Full Make.com Build)

I spent three to four weeks building an AI voice agent that cold-calls HVAC owners for me. Real outbound calls. Real objection handling. Real appointments booked. When we listen back through the recordings, the owners never clock it as AI. They just think it is a human SDR with a slightly off cadence.

This is the build, and the honest reason it is sitting on the shelf for high-volume use right now.

The stack

Make.com runs the whole thing. Seven scenarios wired around a single Google Sheet called NeverMiss – Outbound Calls that acts as the CRM.

dialler — every five minutes, pulls one row where status=new, enabled=TRUE, attempts<3, and fires a call into Vapi
Logger — webhook from Vapi after every call. Writes transcript, recording, and outcome back. Uses an OpenAI module to classify the transcript into structured outcomes
Watch Dog — every ten minutes, catches rows where the call went out but the Logger webhook never fired. Flips them to logger-missed for retry
Reconciler — sweeps stuck rows, enforces attempt caps, handles edge cases
Recording Sync — pulls MP3s out of Vapi before the 30-day retention wipes them, stores in Drive
Booking Handler — when the agent closes a booking mid-call, writes the calendar event and fires the confirmation
CallBack Dialer — separate path for inbound callback requests where a form fill triggers a call within 60 seconds

One Vapi agent on the voice side. Four Twilio numbers rotating for caller ID reputation. ElevenLabs for the actual voice. The complexity is not in any one piece. It is in how tightly they are wired so nothing quietly breaks.

The Sheet is the CRM, with row locks

The Sheet has columns you would expect: business_name, phone_e164, status, call_id, recording_url, email, last_called_at, attempts, and notes.

And some you might not: lead_uid (UUID, stable key), lock_token plus lock_expires_at, next_callable_at, and logger_ran_at.

The lock columns are the interesting part. Before the dialler fires a call, it writes a lock token and an expiry where lock_expires_at = now + 10 minutes. That is the contract. If any other scenario or retry sees a row with a live lock, it skips that row. If the lock has expired, the row is free to reclaim.

Stops the Watch Dog from re-queueing a row that is actively being called. Stops two dialler runs from racing the same lead. Stops the double-call bugs you only catch in production after they have already annoyed a prospect.

This is the kind of thing you do not need until volume hits, and then you really need it.

How it sounds like a human

Most AI voice builds fail on two things. The voice itself, and the gap between the human finishing their sentence and the agent replying. Both telegraph robot immediately.

For the voice. ElevenLabs inside Vapi, cloned from a real sample with permission. Tested OpenAI, Deepgram, and PlayHT alongside. ElevenLabs still wins on natural breath patterns and micro-pauses. The others sound synthesized. This one reads as human.

For the latency. Vapi exposes tunable voice activity detection and endpointing thresholds. The gap between you finishing a sentence and the agent starting its reply has to sit right where a human would land. Too short and it interrupts. Too long and the silence gives it away.

The system prompt runs about 1,100 words. The agent is named Alex. Rules include:

Never read the prospect back to themselves
Use natural fillers without overdoing them
If asked whether this is a robot, be honest
If asked for pricing, give a range and offer to send email details
If told to stop calling, confirm and end gracefully

Specific objections each get short inline responses that flip back to a question. Nothing is read aloud. Everything is handled as conversation.

The agent can also call two tools mid-call. One books an appointment, which triggers the Booking Handler scenario to write the calendar event and email the confirmation. The other captures an email, writes it to the Sheet, and kicks off a follow-up sequence. When those tools fire, the agent pauses briefly, waits for the response, then reads the confirmation back to the prospect. No human in the loop on a successful call.

The glue: Logger, Watch Dog, Recording Sync

This is the boring but critical part.

Logger

Every Vapi call ends with a webhook hitting a Make scenario. The payload has the call ID, transcript, recording URL, end reason, and a Vapi-generated summary. The Logger finds the matching Sheet row by lead_uid, then runs the transcript through OpenAI to classify the outcome into the taxonomy used by the rest of the system:

booked
email_captured
voicemail
no_answer
do_not_call
no_interest
callback_requested

It writes everything back to the row and fires downstream actions. Calendar events for bookings. DNC flags for opt-outs. Re-queue for voicemails up to the attempt cap.

Watch Dog

Vapi occasionally fails to fire the post-call webhook. Without a watchdog, those rows sit silently in a dialling state forever while you think everything is fine.

Mine runs every ten minutes. It finds rows where last_called_at is older than five minutes but logger_ran_at is still empty, and flips status to logger-missed so the dialler picks them up again on the next pass. It also clears stale lock tokens so the row can be re-used.

Recording Sync

Vapi deletes recordings after 30 days. That is not long enough. I want them forever to retrain the prompt, spot objection patterns, and audit outcomes. So a Make scenario runs every four hours, pulls the MP3 via HTTP, uploads to a Drive folder named <lead_uid>_<company>_<date>.mp3, writes the Drive URL back to the Sheet, and flips a synced flag.

That permanent archive is what makes the next section possible.

Owners cannot tell it is AI. The actual numbers.

The recording sync is how I proved this instead of just claiming it. After 200 connected calls, I went back and tagged every recording on two questions. Did the prospect ever ask if it was a bot? Did they ever seem confused about who they were talking to?

11 out of 200 asked if it was AI. That is 5.5 percent.

And most of those 11 asked after Alex had already booked a callback or captured their email. Curiosity, not suspicion.

The other 94.5 percent had a normal conversation. Pushed back, negotiated, agreed to a follow-up, or politely declined. Same way they would with any human SDR. Successful bookings averaged three minutes and 40 seconds of call time. That is identical to a human SDR doing the same call, meaning prospects were not shortcutting the conversation because they smelled a robot.

I would share clips but they are real owners' voices without opt-in for public sharing. The data speaks for itself.

Why I have paused high-volume use

Here is the uncomfortable part.

Most HVAC numbers on public directories are not direct owner lines. They are main business lines routing through:

An IVR menu (press 1 for service, 2 for billing, 3 for new installs)
A call center or answering service
A front-desk receptionist whose actual job is keeping people like me away from the owner

Current voice AI is still bad at IVRs. Vapi, Retell, Bland, the whole field. The agent hears press 1 for service and either sits silent until the menu times out, or tries to talk to a recording.

Human gatekeepers are a coin flip. A warm receptionist might put the call through. A suspicious one kills it with one question the agent does not have a clean answer for.

Until voice AI gets genuinely good at navigating phone trees (pressing digits intelligently, detecting robot-vs-human, adjusting strategy mid-call) high-volume outbound to public numbers burns money for roughly 10 to 15 percent owner connection rates.

So I have paused it. Not dead. Paused. The day voice AI handles IVRs cleanly, this whole stack comes right back off the shelf.

Why it still works

If you have direct owner mobile numbers, or any list of cell phones that do not route through IVRs and gatekeepers, this system is a weapon.

40-55%

Connection rate on direct owner mobiles

8-14%

Book rate on connected calls

~$0.18

All-in cost per connected minute (Vapi + Twilio + ElevenLabs + OpenAI)

Those are cold-outbound economics that actually pencil.

Where direct mobiles come from ethically: LinkedIn scraping through Apollo, Clay, or Wiza. Network-sourced referral lists. Industry trade show attendee lists. Re-engagement campaigns against your own past leads and customers.

The system also works flawlessly for inbound callback. Someone fills a form, Alex calls within 60 seconds, qualifies, and books. Running that in production for specific clients right now. Book rates are silly.

What I am building instead

Inbound. AI receptionists catching the calls HVAC owners are already missing nights, weekends, and during peak summer rush. Same voice agent pattern, flipped direction. Better unit economics, 100 percent connection rate (the prospects are the ones calling you).

That is what NeverMiss is now. If you run an HVAC, plumbing, or roofing business and want to hear what a modern AI receptionist sounds like when the call lands at 9:47 PM on a Saturday, the live demo does exactly that. You enter your business details and the system calls you back within 60 seconds with a receptionist trained on your company.

When voice AI finally cracks IVR navigation, the dialler comes off the shelf the same day.

I built an AI cold-caller for HVAC outbound. Owners cannot tell it is a bot.