B2B SaaS Outbound
The v1 hero score. Two agents in a chain — cold-leads finds and contacts, reply-triage classifies and advances. Both write to the same Pipeline. The Pipelines UI is the operator’s view of what the score is doing in real time.
What the score does
Sunday night you sign up, define your ICP, connect Gmail + Apollo, and pick this score. Monday at 09:00 the cold-leads agent runs. By Wednesday you have replies. By Friday you have a booked demo. The Pipeline shows you exactly what Maestro did, separate from any other system you have.
That’s the bar. The score is a single page on the wall — when it works, this is the case study; when it doesn’t, here’s what to fix.
Agents
cold-leads ──── handoff ───→ reply-triage
│ │
└─────── writes to ────────────┘
│
▼
pipeline pl_saas
Cold leads · SaaS
Runs on cron (default 0 9 * * MON-FRI — 9am ET weekdays). Each run:
- Apollo —
find_leadsmatching the ICP (titles, seniorities, headcount, location). Default per_page 10. - Pipeline —
find_contactto skip anyone already in the pipeline. - Apollo —
enrich_domainfor company context (funding, tech stack, headcount). - Compose —
draft_personalized_openergrounded in the candidate + company signal. - Pipeline + (Gmail) — see Review mode vs send mode below.
- Notify — fire one summary notification (
send_attentionin review mode,send_eventin send mode).
Hard cap: 3 drafts per run in v1. Raise this once the agent has been validated on real prospects for a week. The cap is enforced by the agent’s instructions, not by the runtime.
Skills: apollo, compose, gmail, pipeline, notify.
Review mode vs send mode
The cold-leads agent ships in review mode by default. The “Mode:” line at the top of the agent’s instructions controls which branch the LLM follows:
-
Review mode (default) — agent drafts the opener, writes the contact at stage
ready, logs anoteactivity withpayload.draft: truecontaining the full subject + body. After the run, firesnotify.send_attention(“3 drafts ready for review”). The operator opens each contact’s detail page, reviews/edits the draft inline, clicks Send via Gmail. The send is dispatched by the API (not the agent) via the same Gmail OAuth bundle; the activity flips in place tokind: contacted, the contact’s stage advances tocontacted. Operator is in the loop. -
Send mode — agent calls
gmail.send_emaildirectly during the run, writes the contact at stagecontacted, logs acontactedactivity with{ subject, preview, gmail_message_id, gmail_thread_id }. After the run, firesnotify.send_event. No human in the loop.
To switch modes, edit the cold-leads agent’s instructions: replace the “REVIEW branch” block with the “SEND branch” block (the inactive-mode branch is included in the seed instructions for reference). Save. Next run uses the new mode.
The recommended onboarding path: ship review mode for the first ~5–10 successful runs (you build trust in the LLM’s drafts and your ICP filter precision), then flip to send mode once the drafts are consistently good without edits.
Reply triage
Runs on cron (default */30 * * * * — every 30 minutes). Each run:
- Gmail —
list_inboxforis:unreadthreads. - For each thread:
- Pipeline —
find_contactmatching the sender’s email. Skip if not a tracked contact. - Skip if the contact’s stage is already
replied,triaged,booked,disqualified, orunsubscribed(already handled). - Gmail —
read_threadto get the latest message body. - Compose —
classify_reply_intent. Returns one ofinterested,not_interested,out_of_office,wrong_person,unsubscribe,needs_review. - Pipeline —
log_activitywith kindtriaged,new_stagemapped from the intent, payload{ intent, confidence, explanation }.
- Pipeline —
Critical constraint: reply-triage does not draft replies. Classification and stage advancement only. The human writes the actual response. This is a deliberate v1 design — keeping the human in the loop on every send protects your sender reputation while we build the voice-modeling and feedback loops needed to draft replies that sound like you. Reply drafting is on the roadmap for v2.
Skills: gmail, compose, pipeline.
Handoff
Cold-leads names reply-triage as its handoff target. In v1, both agents run on independent cron schedules and coordinate through the Pipeline (cold-leads writes contacts; reply-triage reads them). Automatic handoff-on-completion is on the roadmap.
Stage mapping
The score defines this stage progression for every contact:
new → enriching → ready → contacted → replied → triaged → booked
disqualified
unsubscribed
| Event | Stage transition | Logged activity |
|---|---|---|
| cold-leads sends opener | new → contacted | contacted |
| reply-triage classifies “interested” or “needs_review” | contacted → replied | triaged |
| reply-triage classifies “not_interested” / “wrong_person” | contacted → disqualified | triaged |
| reply-triage classifies “unsubscribe” | contacted → unsubscribed | triaged |
| reply-triage classifies “out_of_office” | unchanged | none (will reprocess on next run) |
| operator manually books a meeting | replied → booked | manual note (UI not yet wired) |
new, enriching, ready are reserved for richer pre-contact pipelines (e.g. when an enrichment agent fronts the cold-leads agent). v1 cold-leads jumps straight to contacted.
Configuring for your install
The seed agents ship with example sender context and an example ICP. To use the score for your own outreach:
- Edit the cold-leads agent instructions (Maestro → Agents → Cold leads · SaaS → Edit). Replace the sender section, the ICP filters, and the bonus signals with your own. The runtime hands these instructions to Claude every iteration; the agent acts on what’s in this text.
- Edit the reply-triage agent instructions if you want to customize the intent → stage mapping.
- Add the secrets the skills need:
apollo_api_key(Apollo skill — see docs/skills/apollo.md)google_oauth_client_id+google_oauth_client_secretand connect Gmail (see docs/skills/gmail.md)ANTHROPIC_API_KEYis configured at install time
- Set the agents’
statustorunning(orscheduledif you want them on cron only) when you’re ready to go live. They ship asidleso configuration happens before any emails go out.
Cost estimate
Per cold-leads run with 3 sends:
| Operation | Calls | Approx cost |
|---|---|---|
apollo.find_leads | 1 | covered by Apollo plan |
apollo.enrich_domain | ~3 | covered by Apollo plan |
compose.draft_personalized_opener (Haiku) | 3 | $0.0015 |
gmail.send_email | 3 | free (your own Gmail) |
pipeline.* writes | ~7 | free (Postgres) |
| Anthropic agent loop (~25 iterations) | 1 run | ~$0.05 (Sonnet 4.6) |
So ~$0.05 per cold-leads run, dominated by the agent loop’s Sonnet calls. Set MAESTRO_MODEL=claude-haiku-4-5-20251001 to drop this to ~$0.005/run if Haiku reasoning is good enough for the orchestration logic.
Reply-triage runs are smaller (~$0.02 each) because there’s less per-thread work.
At the default schedule (cold-leads weekday mornings, reply-triage every 30 min), monthly Anthropic spend lands around $5–10 before scaling sends.
Verifying the score works
After installing (and with the agent in review mode, the default):
- Run cold-leads manually from the dashboard. Watch the run timeline — it should show ~20 LLM steps + tool calls (no Gmail sends in review mode, so a few fewer steps than send mode).
- Bell icon should show an unread
attentionnotification — “3 drafts ready for review”. - Click into the SaaS pipeline. Three new contacts should be there at stage
ready. - Open one of the contacts. The draft (subject + full body) renders in a brass-bordered review card above the activity timeline. Read it.
- Optional: edit the subject or body inline.
- Click Send via Gmail. Within ~2 seconds the card disappears, the activity flips to
contacted, the contact’s stage advances tocontacted, and the email lands in your Gmail Sent folder. - Reply to that email yourself with “yes, would Tuesday at 2pm work?” Run reply-triage manually.
- Bell icon pings: “Sarah Chen replied — needs response”.
- Pipelines UI — the contact advanced to stage
repliedwith atriagedactivity logged.
That’s the full hero-score loop, end-to-end, on real infrastructure with the human review safety net.
After ~5–10 review-mode runs that produce sendable drafts without significant edits, flip the agent to send mode (see Review mode vs send mode above) and runs become fully autonomous.
Out of v1 scope
- Auto-drafted replies. Stays human in v1.
- Paced sending across hours (the
gmail.queue_paced_sendoperation). v1 sends in-loop during the cold-leads run; pacing across a day requires a persistent send queue + scheduler. Lands in a future release. - Multi-pipeline scores. A single score writes to a single pipeline. To run “Cold leads · Healthcare” alongside SaaS, clone the agents to a second
cold-leads-healthagent pair. - Handoff auto-triggering. Cold-leads names reply-triage in
handoffToAgentIdbut the runtime doesn’t auto-trigger it on completion. Both agents run on independent crons in v1. - In-app notifications when intent=‘interested’. Activity is logged; the UI doesn’t yet pop a banner. a future release.