f2: See the loop + tokens
In f1 you watched the reply arrive, first as result.text, then as a stream. Either
way, that text is the headline: the last line of a short conversation the agent ran
for you.
One note on delivery: this challenge calls .generate(), not .stream(). The lesson
here is reading the finished run, its messages, steps, and token counts, and code
that inspects a run wants the whole thing. Stream when a human watches; .generate()
when code consumes.
Quick path
Section titled “Quick path”In a hurry? These three steps are the whole challenge. Everything below is the why and the how.
- Run
npm run f2and see the reply plus the raw message list, but no token counts or step count yet. - Edit
start/agent.ts: TODO 1 (write the metadata logs, finishReason, steps, usage), TODO 2 (changeinstructionsand watch token counts move), TODO 3 (read the same run as a span tree). - Done when you can point at the two
ai.generateTextspans, read the token counts off them, and say whystepsis 1 and what would make it grow.
In this challenge we open the hood. We’ll read that conversation, count the tokens it cost, and watch the whole run show up as a trace: live in the autotel-devtools browser viewer, and printed to your console too.
Mental model
Section titled “Mental model”A run is this little message list; “steps” is the number of turns it took, and tokens are counted across the whole list.
The mechanic, on a throwaway bot
Section titled “The mechanic, on a throwaway bot”Every result is more than .text. Here’s the whole object, read off a one-line bot that
has nothing to do with TripMate:
Reading the metadata is just logging the last three of those fields; writing those log lines is your TODO 1, on TripMate’s result rather than this bot’s.
steps is the number to keep an eye on. It’s 1 here: one request, one response. In f4
you give the agent a tool and watch it climb, because the agent has to go back to the
model once the tool has run.
Open start/agent.ts. It’s the f1 TripMate agent, and your job is to
read the same fields off its result, the metadata logs aren’t written for you.
Run it:
You get the reply, then the list of messages the model produced. Scroll up and you’ll see more above your own logs: the run, drawn as a tree of spans. That’s the trace, and it’s on for every challenge from here.
Build it
Section titled “Build it”-
Log the run metadata (TODO 1). Under the message loop, add a
--- run metadata ---section that prints the same three fields you saw on the throwaway bot:result.finishReason,result.steps.length, andresult.usage. Write the lines yourself, nothing is pre-commented. Run again and you should seefinishReason: stop,steps: 1, and ausageblock with the token counts. That’s the simplest possible loop: one request in, one response out. -
Watch the tokens move (TODO 2). Every call spends tokens, and what you put in the instructions is part of the bill. Add a brevity rule of your own to TripMate’s
instructions— cap the answer to a sentence or so, your wording — and compare the counts to step 1: fewer words out, feweroutputTokens. Now try the opposite: paste a long paragraph of persona into the instructions and watchinputTokensgo up instead, because the instructions are sent on every call. Predict the direction before each run. -
Read the trace (TODO 3). In another terminal run
npm run devtoolsand open http://127.0.0.1:4445, then runnpm run f2again. The run appears as a tree of spans you can click into, with the model, the prompt, the response, and the token usage on each.The same spans also print above your own logs in the console, something like this:
Each span shows its name, how long it took, and a row of attributes. This is
autotelwired into every challenge throughinstrument.ts:debug: "pretty"for the console,devtools: truefor the browser. The Python path shows the same spans throughlogfire. -
Check you’ve got it. You should be able to point at the two spans, read the token counts off them, and say why
stepsis1here and what would make it grow.
Stuck? finish/agent.ts is the canonical version. Read it after you’ve had a real go.
-
usageshowsundefinedtoken fields. Some local Ollama builds don’t report every count.stepsandfinishReasonare the load-bearing ones; token counts are a bonus when they’re there. -
result.response.messageslooks like nested objects, not a string. That’s correct. A message’s content is a list of typed parts (text now, tool-call and tool-result later). The shape is the point. -
No trace prints. Run with
npm run f2, which loads the tracing bootstrap via--import ./instrument.ts. f1 ships untraced, so it stays silent.
A couple of things worth knowing
Section titled “A couple of things worth knowing”Why is the answer "just the last message"?
A model returns the next message in a conversation, not a standalone answer. The framework
sends a list of messages (your instructions, the user’s prompt), the model appends one (its
reply), and result.text is the text of that appended message.
When tools arrive, the model appends a tool-call message, the framework appends a tool-result message, and the model appends another reply. The answer is always the last message in a growing list, and reading that list is how you debug every later challenge.
What exactly is a token?
Models read and write tokens: chunks of text, often a word or a piece of one. “Lisbon” might be one token; “unforgettable” might be three.
usage.inputTokens is roughly how much you sent (instructions plus prompt plus any tool
results), and usage.outputTokens is how much it wrote back. Providers bill and rate-limit
by token count, which is why “be concise” is both a style choice and a cost choice.
That’s the hood open. Next up is f3, where we make the model return typed data instead of prose, so your code can use the result without parsing it.