Skip to content

f2: See the loop + tokens

In f1 you watched the reply arrive, first as result.output, then as a stream. Either way, that text is the headline, but it’s the last line of something bigger: a short conversation the agent ran for you.

One note on delivery: this challenge calls await agent.run(...), not run_stream. The lesson here is reading the finished run, its messages, requests, and token counts, and code that inspects a run wants the whole thing. Stream when a human watches; run() when code consumes.

In a hurry? These three steps are the whole challenge. Everything below is the why and the how.

  1. Run make f2 and read the reply plus the list of messages the run produced, and scroll up to spot the trace.
  2. Edit start/agent.py: do TODO 1 (write the metadata block to print requests and usage), TODO 2 (change the instructions and watch token counts move), TODO 3 (read the same run as a span tree).
  3. Done when you can read requests: 1 and the token counts, point at the agent run and chat spans, and say what would make requests grow.

In this challenge we open the hood. We’ll read that conversation, count the tokens it cost, and watch the whole run show up as a trace: live in the autotel-devtools browser viewer, and printed to your console too.

[ instructions ] + [ user: prompt ]  ->  model  ->  [ assistant: reply ]

A run is this little message list; usage().requests is how many model calls it took, and tokens are counted across the whole list.

Every result is more than .output. Here’s the whole object, read off a one-line bot that has nothing to do with TripMate:

bot = Agent(model, instructions="Answer in one sentence.")
result = await bot.run("What's the tallest mountain on Earth?")

result.output                  # the headline: the model's reply
result.all_messages()          # the conversation, as structured messages
result.usage                   # input tokens, output tokens, requests
result.usage.requests          # model round-trips (1 with no tools)

Reading the metadata is just printing the usage fields; writing those lines is your TODO 1, on TripMate’s result rather than this bot’s.

requests is the number to keep an eye on. It’s 1 here: one request, one response. In f4 you give the agent a tool and watch it climb, because the agent has to go back to the model once the tool has run.

Open start/agent.py. It’s the f1 TripMate agent, switched to await agent.run(...). Your job is to read the same fields off its result, the metadata logs aren’t written for you.

Run it:

make f2

You get the reply, then the list of messages the model produced. And if you scroll up, something printed above your own logs: the run, drawn as a tree of spans. That’s the trace, and it’s on for every challenge from here.

  1. Log the run metadata (TODO 1). Under the message loop, add a --- run metadata --- section that prints the same fields you saw on the throwaway bot: result.usage.requests and result.usage. Write the lines yourself, nothing is pre-commented. Run again and you should see requests: 1 and a usage line with the token counts. That’s the simplest possible loop: one request in, one response out.

  2. Watch the tokens move (TODO 2). Every call spends tokens, and what you put in the instructions is part of the bill. Add a length rule and compare the counts to step 1:

    instructions="You are TripMate, a friendly trip planner. Be concise: one sentence.",

    Fewer words out, fewer output tokens. Now try the opposite: paste a long paragraph of persona into the instructions and watch the input tokens go up instead, because the instructions are sent on every call. Predict the direction before each run.

  3. Read the trace (TODO 3). In another terminal run make devtools and open http://127.0.0.1:4446, then run make f2 again. The run appears as a tree of spans you can click into, with the model, the prompt, the response, and the token usage on each.

    The same spans also print above your own logs in the console, something like this:

    20:54:34.207 agent run
    20:54:34.211   chat granite4.1:3b

    This is logfire, switched on by the enable_tracing() call at the top of the file: it prints to the console and, when the viewer is running, streams to autotel-devtools. f1 left it out so its first run was clean; from here it’s on for every challenge. The TypeScript path shows the same spans through autotel.

  4. Check you’ve got it. You should be able to point at the agent run and chat spans, read the token counts off the usage line, and say why requests is 1 here and what would make it grow.

Stuck? finish/agent.py is the canonical version. Read it after you’ve had a real go.

  • The token fields look low or odd. Some local Ollama builds report partial counts. requests is the load-bearing number; token counts are a bonus when they’re there.
  • all_messages() looks like nested objects, not strings. That’s correct. A message is a list of typed parts (UserPromptPart, TextPart, and later tool-call and tool-return parts). The shape is the point.
  • No trace prints. Make sure the file calls enable_tracing() (f2 onward do; f1 leaves it off by design).
Why is the answer "just the last message"?

A model returns the next message in a conversation, not a standalone answer. Pydantic AI sends a list of messages (your instructions, the user’s prompt), the model appends one (its reply), and result.output is the text of that appended message.

When tools arrive, the model appends a tool-call message, the framework appends a tool-return message, and the model appends another reply. The answer is always the last message in a growing list, and reading that list with all_messages() is how you debug every later challenge.

What exactly is a token?

Models read and write tokens: chunks of text, often a word or a piece of one. “Lisbon” might be one token; “unforgettable” might be three.

The input tokens are roughly how much you sent (instructions plus prompt plus any tool results), and the output tokens are how much it wrote back. Providers bill and rate-limit by token count, which is why “be concise” is both a style choice and a cost choice.

What is the console trace actually showing?

Each run emits OpenTelemetry spans: timed, named records of what happened. The agent’s run becomes a span, the model call a child span, with the model, prompt, response, and token usage attached as attributes. logfire collects these and prints them to your console as a tree. logfire is a fully OTLP-compliant OpenTelemetry SDK, so the same spans can go to a viewer like otel-tui or Jaeger: set WORKSHOP_OTLP_ENDPOINT to its address and run again.

That’s the hood open. Next up is f3, where we make the model return typed data instead of prose, so your code can use the result without parsing it.