f2: See the loop + tokens
In f1 you watched the reply arrive, first as result.output, then as a stream. Either
way, that text is the headline, but it’s the last line of something bigger: a short
conversation the agent ran for you.
One note on delivery: this challenge calls await agent.run(...), not run_stream. The
lesson here is reading the finished run, its messages, requests, and token counts, and
code that inspects a run wants the whole thing. Stream when a human watches; run() when
code consumes.
Quick path
Section titled “Quick path”In a hurry? These three steps are the whole challenge. Everything below is the why and the how.
- Run
make f2and read the reply plus the list of messages the run produced, and scroll up to spot the trace. - Edit
start/agent.py: do TODO 1 (write the metadata block to printrequestsandusage), TODO 2 (change theinstructionsand watch token counts move), TODO 3 (read the same run as a span tree). - Done when you can read
requests: 1and the token counts, point at theagent runandchatspans, and say what would makerequestsgrow.
In this challenge we open the hood. We’ll read that conversation, count the tokens it cost, and watch the whole run show up as a trace: live in the autotel-devtools browser viewer, and printed to your console too.
Mental model
Section titled “Mental model”A run is this little message list; usage().requests is how many model calls it took, and tokens are counted across the whole list.
The mechanic, on a throwaway bot
Section titled “The mechanic, on a throwaway bot”Every result is more than .output. Here’s the whole object, read off a one-line bot that
has nothing to do with TripMate:
Reading the metadata is just printing the usage fields; writing those lines is your TODO 1, on TripMate’s result rather than this bot’s.
requests is the number to keep an eye on. It’s 1 here: one request, one response. In
f4 you give the agent a tool and watch it climb, because the agent has to go back to the
model once the tool has run.
Open start/agent.py. It’s the f1 TripMate agent, switched to await agent.run(...). Your job is to read the same fields off its result, the metadata logs
aren’t written for you.
Run it:
You get the reply, then the list of messages the model produced. And if you scroll up, something printed above your own logs: the run, drawn as a tree of spans. That’s the trace, and it’s on for every challenge from here.
Build it
Section titled “Build it”-
Log the run metadata (TODO 1). Under the message loop, add a
--- run metadata ---section that prints the same fields you saw on the throwaway bot:result.usage.requestsandresult.usage. Write the lines yourself, nothing is pre-commented. Run again and you should seerequests: 1and ausageline with the token counts. That’s the simplest possible loop: one request in, one response out. -
Watch the tokens move (TODO 2). Every call spends tokens, and what you put in the instructions is part of the bill. Add a length rule and compare the counts to step 1:
Fewer words out, fewer output tokens. Now try the opposite: paste a long paragraph of persona into the instructions and watch the input tokens go up instead, because the instructions are sent on every call. Predict the direction before each run.
-
Read the trace (TODO 3). In another terminal run
make devtoolsand open http://127.0.0.1:4446, then runmake f2again. The run appears as a tree of spans you can click into, with the model, the prompt, the response, and the token usage on each.The same spans also print above your own logs in the console, something like this:
This is
logfire, switched on by theenable_tracing()call at the top of the file: it prints to the console and, when the viewer is running, streams to autotel-devtools. f1 left it out so its first run was clean; from here it’s on for every challenge. The TypeScript path shows the same spans through autotel. -
Check you’ve got it. You should be able to point at the
agent runandchatspans, read the token counts off the usage line, and say whyrequestsis1here and what would make it grow.
Stuck? finish/agent.py is the canonical version. Read it after you’ve had a real go.
- The token fields look low or odd. Some local Ollama builds report partial counts.
requestsis the load-bearing number; token counts are a bonus when they’re there. all_messages()looks like nested objects, not strings. That’s correct. A message is a list of typed parts (UserPromptPart,TextPart, and later tool-call and tool-return parts). The shape is the point.- No trace prints. Make sure the file calls
enable_tracing()(f2 onward do; f1 leaves it off by design).
A couple of things worth knowing
Section titled “A couple of things worth knowing”Why is the answer "just the last message"?
A model returns the next message in a conversation, not a standalone answer. Pydantic AI
sends a list of messages (your instructions, the user’s prompt), the model appends one
(its reply), and result.output is the text of that appended message.
When tools arrive, the model appends a tool-call message, the framework appends a
tool-return message, and the model appends another reply. The answer is always the last
message in a growing list, and reading that list with all_messages() is how you debug
every later challenge.
What exactly is a token?
Models read and write tokens: chunks of text, often a word or a piece of one. “Lisbon” might be one token; “unforgettable” might be three.
The input tokens are roughly how much you sent (instructions plus prompt plus any tool results), and the output tokens are how much it wrote back. Providers bill and rate-limit by token count, which is why “be concise” is both a style choice and a cost choice.
What is the console trace actually showing?
Each run emits OpenTelemetry spans: timed, named records of what happened. The agent’s
run becomes a span, the model call a child span, with the model, prompt, response, and
token usage attached as attributes. logfire collects these and prints them to your
console as a tree. logfire is a fully OTLP-compliant OpenTelemetry SDK, so the same spans
can go to a viewer like otel-tui or Jaeger: set WORKSHOP_OTLP_ENDPOINT to its address
and run again.
That’s the hood open. Next up is f3, where we make the model return typed data instead of prose, so your code can use the result without parsing it.