Maestro

Compose

The Compose skill is Maestro’s LLM-backed writing helper. Two operations, both grounded in Anthropic Claude:

  • draft_personalized_opener writes a cold-outreach email tailored to one lead.
  • classify_reply_intent triages a reply into one of six intents so the reply-triage agent knows whether to advance the contact, escalate to a human, or stop.

These are the first operations in the catalog that call a language model directly — the others (Gmail, Apollo, web-research) are deterministic. The model used is recorded in every result, so swapping defaults later doesn’t silently change behaviour for existing runs.

What this skill does

OperationPurposeStatus
draft_personalized_openerSubject + body for one lead, grounded in their recent signal (funding, hiring, blog post, …). Uses Anthropic tool-use to force structured output.Shipped
classify_reply_intentClassifies a reply into interested / not_interested / out_of_office / wrong_person / unsubscribe / needs_review, with confidence and a one-sentence explanation.Shipped

Setup

The skill uses the same Anthropic API key the runtime already needs. If your runtime is up and running agents, Compose works automatically — ANTHROPIC_API_KEY from your .env is picked up via the SDK’s env-first secret store.

If you want to override per-skill (for example, billing the cold-leads agent’s drafts to a different account), add anthropic_api_key as a vault secret in Secrets. Vault values take precedence over the env var for skill operations.

Cost

Both operations default to Haiku 4.5 (claude-haiku-4-5-20251001) — Anthropic’s fast, cheap model. Typical per-call costs:

OperationInput tokensOutput tokensApprox cost (Haiku)
draft_personalized_opener~600~150$0.0005
classify_reply_intent~300~80$0.0003

For a cold-leads run that drafts 25 openers and triages 5 replies per day, that’s ~$0.014/day or roughly $5/month in Compose costs. The Apollo data + Gmail send are the dominant costs, not the LLM.

For high-stakes prospects, pass model='claude-sonnet-4-6' to use Sonnet for higher-quality drafts at ~10× the cost. The exact model used is always reflected in the result’s model_used field.

How the operations work

draft_personalized_opener

The system prompt enforces a strict template:

  1. Lead with a specific reference to the recipient’s situation (the recent_signal you provide).
  2. State relevance in one sentence.
  3. End with one clear, low-pressure CTA.
  4. Under 90 words total.
  5. Forbidden phrases: “hope you’re well”, “wanted to reach out”, “circling back”, “synergies”, “leverage”, “in the [industry] space”, and other SDR-template tells.

The LLM is required to call a submit_draft tool with subject + body fields — using tool-use rather than free-form text means the output is always structured cleanly without parsing.

Inputs (LeadContext and SenderContext):

  • lead.name — required.
  • lead.recent_signalthe hook. Without this the draft falls back to “saw your title at company” templates. Spend the Apollo enrichment credit; populate this field.
  • sender.one_line_pitch — what you build, in one sentence. Used to ground the relevance pitch.
  • goal — what you’re asking for. Default: “book a 20-minute intro call”.
  • tone — default: “warm, specific, low-pressure”. Override for different audiences.
  • model — Anthropic model id; defaults to Haiku 4.5.

Output (DraftedEmail): subject, body, model_used, input_tokens, output_tokens.

classify_reply_intent

System prompt defines the six intents and tells the model to default to needs_review on ambiguity (better to over-escalate than miss a real “interested”). The submit_classification tool forces a structured response with intent (enum), confidence (0–1), and a one-sentence explanation.

Confidence calibration the prompt asks the model to follow:

  • 0.95+ — unambiguous (e.g. “Yes, Tuesday at 2pm works”).
  • 0.80–0.95 — clear with some interpretation needed.
  • 0.60–0.80 — best guess, agent should double-check.
  • below 0.60 — very uncertain; the agent should pair this with needs_review regardless of intent.

Inputs:

  • reply_text — required. The most recent message body.
  • prior_context — optional. A summary of the thread so far for ambiguous follow-ups.
  • model — defaults to Haiku 4.5.

Output (ReplyClassification): intent, confidence, explanation, model_used, token counts.

The explanation field renders directly in the contact activity timeline, so the human sees why the agent classified the way it did when they review.

Why structured output via tool-use

Compose forces structured output via Anthropic’s tool-use API rather than asking the model to emit JSON in plain text. This is more reliable:

  • Anthropic enforces the schema at the API level — the model can’t return malformed output.
  • A Pydantic model validates a second time on the receiving side, catching anything edge-case.
  • Errors fail loud (Could not parse classification tool call) instead of silent (an opener with a missing subject line).

Both operations declare a single tool and pass tool_choice: {type: 'tool', name: '...'} to force the model to use it.

Failure modes

  • auth_expired: anthropic_api_key not set. Set ANTHROPIC_API_KEY in .env, or add anthropic_api_key to the vault.
  • bad_input: missing required fields (e.g. empty reply_text for classification). The Pydantic validation in the SDK catches this before the API call.
  • rate_limited: Anthropic throttled. The runtime retries automatically with backoff.
  • remote_unavailable: Anthropic returned a 5xx or the connection timed out. Retryable.
  • internal with Could not parse ... tool call (rare): the model didn’t follow tool-use. Indicates either a model-version mismatch or a malformed tool definition. File an issue.

Verifying

Skills → Compose → click Test next to classify_reply_intent:

{
  "reply_text": "Hey thanks for reaching out. We are actually evaluating a few tools in this space right now — would Tuesday afternoon work for a 20-min call?"
}

Should classify as interested with confidence ~0.95+ and a one-sentence explanation. If you see auth_expired, check ANTHROPIC_API_KEY.

For the draft test, see the smoke-test command in the operations module — paste a LeadContext + SenderContext JSON object, click Run.

  • Skills overview — how skills work in general.
  • Apollo — produces the lead context Compose drafts against.
  • Gmail — actually sends the drafted email.
  • Pipelines — where the contact activities (drafted, contacted, replied, triaged) get logged.