Skip to content

p1: Prompt chaining

You start the Patterns track here. In Foundations each challenge was one model call. From now on you compose several of them, in code you write. This challenge teaches the chain. You break a task into small steps, run them in order, and put a gate between them so a bad result never flows downstream.

In a hurry? These three steps are the whole challenge. Everything below is the why and the how.

  1. Run npm run p1 and read the draft: the writer drops the £500 budget, the call to action, or both, with no review or repair.
  2. Edit start/agent.ts: call the reviewer (TODO 1), write the plain-code gate (TODO 2), fill the editor’s instructions and call it (TODO 3), then re-review the fix (TODO 4).
  3. Done when the final verdict prints both mentionsBudget and hasCallToAction true after the edit repairs only what the gate flagged.

The task is to produce a trip pitch that mentions the traveller’s £500 budget and ends with a call to action. Ask one prompt for all of that at once and you cannot tell whether the model did it. A chain makes the work easier to debug. Draft the pitch, judge the draft, fix only what the judge flagged, then check the fix.

The gate is what matters. The reviewer returns a typed verdict, so the decision to ship or to fix is an ordinary if. That is what makes a workflow a workflow: you hold the control flow, in code you can read and test.

draft  ->  [ review: typed verdict ]  ->  gate  --pass-->  ship
                                            \--fail-->  edit only what failed  ->  ship

The reviewer returns a typed verdict, so a plain if in your code decides whether to ship or to fix.

Forget travel. Say you gate a pull-request description: it must summarise the change and give a test plan. A reviewer returns a typed verdict, so the decision is a plain if, not more prose:

const prCheck = z.object({
  hasSummary: z.boolean().describe("true only if the description summarises the change"),
  hasTestPlan: z.boolean().describe("true only if it says how the change was tested"),
});
const reviewer = new ToolLoopAgent({ model, output: Output.object({ schema: prCheck }), instructions: "..." });

// The gate is plain code, because the verdict is typed:
const { hasSummary, hasTestPlan } = (await reviewer.generate({ prompt: pr })).output;
const missing = [
  hasSummary ? null : "add a one-line summary",
  hasTestPlan ? null : "add a test plan",
].filter(Boolean);
if (missing.length === 0) return pr;                       // pass: ship it
const fixed = await editor.generate({ prompt: `Fix only: ${missing.join("; ")}\n\n${pr}` });

Three moves. The reviewer’s Output.object turns its judgement into booleans (the f3 trick), so the gate is an ordinary if. The missing list holds only the failed checks, so the editor repairs those and leaves the rest alone. And the reviewer is a separate call from the writer, so its verdict is independent of the thing it judges. Below you write that gate and the editor’s brief for TripMate.

Open start/agent.ts. The writer and the reviewer are provided. The writer is deliberately never told about the £500 budget or the call to action, so the gate has something real to catch; the reviewer returns a typed { mentionsBudget, hasCallToAction } verdict. What is blank: the gate, the editor’s instructions, and the calls that wire them.

npm run p1

The writer produces a vivid Lisbon pitch. Read it back: it often does not mention the £500 budget, and it often does not ask you to book anything. Nothing checks its work yet, so you have a nice paragraph and no way to repair it.

That dead end is what the rest of the file fixes.

  1. Run it and read the draft. Run npm run p1 and look at what comes back. The writer was never asked for the budget or a call to action, so it usually drops one or both. There is no review, no gate, no repair path.

  2. Call the reviewer and read the verdict (TODO 1). Run the reviewer on the draft and read its typed output, the same call-an-agent, read-.output move you used in f3. Print the two booleans and the notes, so you can see what the gate will act on.

  3. Write the gate (TODO 2). This is the core of the pattern, and it is plain code, not a model call. Read the two booleans on verdict. When both are true, log that the gate passed and return early: nothing to fix. Otherwise build a missing array holding only the failed checks, phrased as instructions (“mention the £500 budget”, “end with a one-line call to action”). A pair of ternaries plus .filter(Boolean) gives you that list; the editor uses it next.

  4. Write the editor’s instructions and call it (TODO 3). Fill in the empty instructions on the editor agent: fix only the missing points, preserve the draft’s voice, keep the length, return only the rewritten pitch. Then call editor.generate(...) the same way you called the reviewer, but feed it a prompt that names the missing points (join them) alongside the original draft. Sending only the failed points, not “make it better”, is what keeps the edit small and faithful.

  5. Re-review the edit (TODO 4). This is the same review call you made above in step 2, run again on the edited draft instead of the original. Read its typed output into a final verdict and print it. Without this re-check you only know you attempted a repair, not whether it worked.

  6. Try to skip the chain (poke it). Add "mention the £500 budget and end with a call to action" to the writer’s instructions, so the draft attempts everything in one shot. Run it a few times. A small model still drops one requirement now and then. The gate turns “usually” into “every time”, and it is the part a single prompt cannot give you.

  7. Check you’ve got it. You should be able to point at the gate line and say why it is code and not a model call, and say in one sentence how a workflow differs from an agent. Scroll up to the trace too: you will see separate ai.generateText spans in sequence (writer, reviewer, maybe editor, then final reviewer), each with its own token usage. When the gate passes first time, the editor and final-review spans are absent, because the repair path never ran.

Stuck? finish/agent.ts is the canonical version. Read it after you’ve had a real go.

  • A gate that is not plain code. If you find yourself asking a model “should I rewrite this?”, stop. The reviewer already returned booleans; the decision is an if.
  • Feeding the editor everything. Send only the failed points, not “make it better”. A specific instruction produces a faithful edit; a vague one rewrites the voice away.
  • Over-checking. Two concrete, checkable criteria beat ten fuzzy ones. The reviewer is only as reliable as its criteria are concrete.
  • No exit. This chain runs each step once. If you ever loop a fix-and-recheck (the p4 evaluator-optimizer), bound the loop so it cannot run forever.
Workflows vs agents

This is the distinction the rest of the workshop turns on, from Anthropic’s “Building effective agents”.

A workflow orchestrates model calls through code paths you write. You decide the order and the branches. That is this challenge.

An agent is a model that directs its own process and tool use in a loop. The model decides the order. That is the agentic challenge later in the workshop.

Workflows give you predictability and a place to put checks; agents give you flexibility when you cannot predict the steps. You are learning to choose between them.

Why split the writer and the reviewer at all?

You could ask one agent to “write a pitch, on budget, with a call to action, and tell me if you succeeded”. On a large model that often works. It degrades on a small one, and it hides the failure: the same call that writes the pitch also grades it, so a miss and a false “looks good” arrive together.

Splitting the roles gives each call one job and gives you an independent verdict you can gate on. This is the separation-of-concerns reason Anthropic gives for chaining and routing.

When is a chain the wrong tool?

When you cannot predict the steps. A chain is a fixed path: draft, review, edit, in that order, always. If the work needs a different number or order of steps depending on the input (search this, then maybe search that, then maybe call a tool), a fixed chain fights you and an agent fits better. The agentic challenge is that case.

Next up is p2, where the gate becomes a fork. You classify the input first, then send it down a different path depending on what it is.