Skip to content

f3: Structured output

So far TripMate has answered in prose. That reads fine to a person, but it’s awkward for code: if your app needs the packing list as an actual list, you’re left scanning the text and hoping the model formatted it the way you expected.

Most of the time you want data from a model: a shape your code can read field by field, with nothing to parse. This challenge makes the same call return a typed object. output_type= is the switch that turns structured output on; the Field(description=...) text you write on each field tells the model what to put in each slot.

In a hurry? Here’s the whole challenge. Everything below is the why and the how.

  1. Run make f3 and watch it print a flowing paragraph you’d have to scrape the packing list out of.
  2. Read the worked example below to learn the mechanic on a tiny throwaway model.
  3. Edit start/agent.py to make TripMate a recommendation agent: wire output_type, then write a Field(description=...) for each field of the Recommendation model (the field names are given; the descriptions are yours to author).
  4. Done when it prints labelled fields instead of prose, result.output.packing_essentials is a real list[str] your code could loop over, and you’ve watched a vague description produce a worse value than a sharp one.
prompt  ->  model  ->  "Lisbon is lovely, pack layers..."          (prose: you parse it and hope)
                   \->  Recommendation(destination=..., packing=[...])  (typed: use it directly)

The model turns a paragraph you have to scrape into fields your code can read. Each field’s Field(description=...) text is what tells the model what to put there.

Before you touch the recommendation agent, here’s the whole mechanic on something tiny and unrelated, a country fact card. Two moves: describe the shape with a Pydantic model, pass it as output_type=.

from pydantic import BaseModel, Field
from pydantic_ai import Agent


class FactCard(BaseModel):
    capital: str = Field(description="the country's capital city")
    languages_spoken: list[str] = Field(description="the official languages, most common first")


agent = Agent(
    model,
    output_type=FactCard,                    # <- the lever: prose mode -> typed mode
    instructions="Give a short fact card for the country the user names.",
)

result = await agent.run("Portugal")

card = result.output          # typed: no parsing
card.languages_spoken         # a real list[str]

Three things to take from this:

  • output_type= is the lever. It switches the agent from text mode to typed mode, and result.output becomes a validated object instead of a string. Without it, result.output is the raw prose.
  • This is await agent.run(...), not the run_stream you ended f1 on. A typed object is consumed by code, and code can’t act on half a JSON object. Stream when a human watches; run() when code consumes.
  • Field names matter, the model reads them, but the Field(description=...) text does the heavy lifting. "the country's capital city" is why capital comes back as a city and not the country name. A vague description gives a vague fill.
  • The model above is throwaway. Your job is to apply the same two moves to a model that actually matters.
make f3

Out of the box you get a flowing paragraph under result.output. It reads well. Now ask how you’d pull the packing list out of it in code: search for a “Packing:” line, split on commas or bullets, and redo it every time the model rewords things. That fragility is the problem the model fixes.

Your challenge: TripMate the recommendation agent

Section titled “Your challenge: TripMate the recommendation agent”

Open start/agent.py. The aim is a TripMate that takes a traveller and returns a typed recommendation. The model’s field names are given so your output lines up with the person next to you, but every field ships with no description, and that’s the part you write.

  1. Wire structured output on (TODO 1). Import BaseModel and Field from pydantic, define the Recommendation model, add output_type=Recommendation to the agent, and swap the prose print for a typed read off result.output. You did each of these moves in the worked example above, apply them here. Don’t copy the FactCard model; you’re building Recommendation.

  2. Write the descriptions (TODO 2), this is the real work. Each field starts bare:

    class Recommendation(BaseModel):
        destination: str                       # Field(description=...), what should this say?
        why_to_visit: str                      # Field(description=...), how long? what angle?
        packing_essentials: list[str]          # Field(description=...), how many? how concrete?

    Add a Field(description=...) to each field. Predict what each one will produce before you run. Want two sentences of reasoning? Say so. Want 3–5 concrete items, not “appropriate clothing”? Say that. The description is the only instruction the model has for that field.

  3. Break a description on purpose (TODO 3). Once it works, make one field’s description deliberately vague, why_to_visit: str = Field(description="some text"), and run again. Watch that field get vaguer or drift while the well-described fields hold. Same model, same prompt; the only thing you changed was the words in the description. That’s the lesson: the description is the interface to the field. Tighten it and the value sharpens.

  4. Check you’ve got it. You can run make f3 and watch it print a paragraph first, then labelled fields after your edits; point at result.output.packing_essentials as a list you could pass straight to code; and show a sharp description and a vague one producing visibly different values for the same field.

Stuck? finish/agent.py is the canonical version, read it after you’ve had a real go, and notice your description text reads differently from the reference. There’s no single right answer; that’s the point.

Why a schema instead of asking for JSON in the prompt?

You could write “reply as JSON with these fields” in the prompt, but then you’re trusting the model to format it perfectly every time and parsing whatever comes back.

output_type= does better. Pydantic AI tells the model to produce data matching your model, then validates the reply against the model before your code sees it. If the model does something weird, like putting a number where a string should go, you get a clear validation error, never a half-parsed string.

Where did the prose go?

In typed mode the model’s job is to fill the model you gave it, so result.output is the typed object you read, field by field. The console trace from f2 shows the structured response attached to the call.

If a field comes back vague or the wrong length, that’s the Field(description=...) text doing, or failing to do, its job; tighten it. Small local models honour descriptions loosely, so a “two sentences” field will sometimes run long. If the run raises a validation error mentioning the schema, the model produced something the schema rejected, so loosen the type (int | None, str | None) or make the description clearer about the shape.

You just learned that descriptions are an interface: the words you write on a field decide what the model puts there. Hold onto that, in f6 the same idea shows up one level out, where a tool’s description decides which tool the model reaches for. Descriptions steer the model everywhere; here it’s fields, there it’s tools.

Next up is f4, where we give the agent a tool. You’ll watch it call out for information it cannot know, and watch a tool result change its answer.