Skip to content

p3: Parallelization

Routing (p2) picks one path. Parallelization runs several at once. When subtasks are independent, you do not finish one before starting the next: fire them together and combine the results. Three reviewers judge the same itinerary, one concern each (cost, weather, safety), and an aggregator merges their verdicts into one.

One concern per call beats one call for everything: a reviewer focused only on cost does better on cost than a prompt juggling all three. So you write one of the reviewers, then fan them out.

In a hurry? These three steps are the whole challenge. Everything below is the why and the how.

  1. Run make p3: the three reviews run one after another, and safety_reviewer has no prompt yet, so its review is junk.
  2. Edit start/agent.py: write safety_reviewer’s single-concern prompt (TODO 1), replace the three sequential awaits with one asyncio.gather (TODO 2), then aggregate the reviews with the synthesiser (TODO 3).
  3. Done when all three reviews are useful, the reviewer spans overlap in the trace, and one synthesiser verdict prints after they resolve.
       .-->  reviewer A  --.
input  -->  reviewer B  -->  aggregate  ->  result
       '-->  reviewer C  --'

Independent reviewers run at once with asyncio.gather, then one call combines them.

Forget travel. Say you want three quick takes on a pull request, one concern each:

style_reviewer = Agent(
    model,
    output_type=Review,
    instructions=(
        "Review a code diff for STYLE only: naming, formatting, clarity. "
        "Rate it 1-5 and give one terse comment."
    ),
)
# ...a security_reviewer and a perf_reviewer, each with its own single concern.

# They do not read each other, so fire them together:
style, security, perf = await asyncio.gather(
    style_reviewer.run(diff),
    security_reviewer.run(diff),
    perf_reviewer.run(diff),
)

Two moves. Each reviewer’s prompt names one concern, so it judges that concern well; a prompt juggling style, security, and perf at once does all three poorly. And because the calls are independent, asyncio.gather fires them together instead of one after another. Below you write one of TripMate’s reviewers, then fan all three out.

Open start/agent.py. budget_reviewer and weather_reviewer are written for you, as the shape to copy; each has output_type=Review so its rating is typed. safety_reviewer’s instructions are blank: that single-concern prompt is TODO 1. In main() the three reviews run sequentially, the “before” you replace in TODO 2.

Parallel calls finish in about the time of the slowest, not the sum, when the provider runs them concurrently (a hosted model does; so does any IO-bound work). A single local GPU often serves generations one at a time, so against Ollama you may see little wall-clock drop. That does not change the lesson: the shape is fan-out-then-aggregate, and it pays the moment you move to a real provider or add calls. The printed milliseconds are real, so compare them.

  1. Run it and read the gap. Run make p3. Budget and weather review fine, but the safety review is junk (no prompt), and the three run one after another for no reason. The printed time is mostly spent waiting.
  2. Write the safety reviewer (TODO 1). Fill safety_reviewer’s instructions with one concern only: safety and practicality (timing, transport, crowds). Match the shape of the two provided reviewers. Re-run; the safety review should now be sharp.
  3. Replace the awaits with asyncio.gather (TODO 2). The heart of the pattern. The three calls do not depend on each other, so replace the sequential block with one asyncio.gather(...) and unpack as budget, weather, safety. The reviews list below keeps working unchanged: same calls, different timing.
  4. Aggregate (TODO 3). Three raw reviews are not an answer. Add import json, pass the collected reviews as json.dumps(reviews) to the synthesiser, and print its two-sentence verdict.
  5. Add a reviewer (poke it). Add a fourth reviewer to the batch (food, accessibility). One more coroutine in the gather, one more review for the synthesiser; the shape does not change.
  6. Check you’ve got it. Say why these calls are safe to parallelize (they are independent), and name the two flavours (sectioning and voting). The trace shows the three reviewer spans overlapping, then the synthesiser after.

Stuck? finish/agent.py is one version. Read it after you’ve had a real go; your safety prompt will read differently.

  • Parallelizing dependent work. asyncio.gather is safe only when the calls do not need each other’s output. If B needs A’s result, that is a chain (p1), not a fan-out.
  • Forgetting to aggregate. Three raw reviews are not an answer; the aggregation step turns N opinions into one decision.
  • One overloaded prompt. Asking a single call to rate cost, weather, and safety at once blurs all three. Separate, single-concern calls keep each focused.
  • Unbounded fan-out. Firing hundreds of calls at once can hit rate limits. Batch them if the list is large.
Sectioning vs voting

This challenge is sectioning: split a task into different subtasks (cost, weather, safety) and run each once. The other flavour is voting: run the same check several times and take a majority, raising confidence on a judgement call. Same asyncio.gather shape, different purpose: sectioning covers more ground, voting buys certainty.

Parallelization vs the orchestrator (later in the workshop)

Here you fixed the subtasks in advance: cost, weather, safety, always those three. When you cannot predict the subtasks, an orchestrator agent decides them at runtime and delegates to workers. That is the delegation challenge later on. Parallelization is the fixed-list version; the orchestrator is the dynamic one.

Next up is p4, where one agent’s output is scored by another in a loop until it is good enough.