Skip to content

f2: See the loop + tokens

In f1 you watched the reply arrive, first as result.text, then as a stream. Either way, that text is the headline: the last line of a short conversation the agent ran for you.

One note on delivery: this challenge calls .generate(), not .stream(). The lesson here is reading the finished run, its messages, steps, and token counts, and code that inspects a run wants the whole thing. Stream when a human watches; .generate() when code consumes.

In a hurry? These three steps are the whole challenge. Everything below is the why and the how.

  1. Run npm run f2 and see the reply plus the raw message list, but no token counts or step count yet.
  2. Edit start/agent.ts: TODO 1 (write the metadata logs, finishReason, steps, usage), TODO 2 (change instructions and watch token counts move), TODO 3 (read the same run as a span tree).
  3. Done when you can point at the two ai.generateText spans, read the token counts off them, and say why steps is 1 and what would make it grow.

In this challenge we open the hood. We’ll read that conversation, count the tokens it cost, and watch the whole run show up as a trace: live in the autotel-devtools browser viewer, and printed to your console too.

[ instructions ] + [ user: prompt ]  ->  model  ->  [ assistant: reply ]

A run is this little message list; “steps” is the number of turns it took, and tokens are counted across the whole list.

Every result is more than .text. Here’s the whole object, read off a one-line bot that has nothing to do with TripMate:

const bot = new ToolLoopAgent({ model, instructions: "Answer in one sentence." });
const result = await bot.generate({ prompt: "What's the tallest mountain on Earth?" });

result.text;                // the headline: the last assistant message
result.response.messages;   // what the model sent back, as structured messages
result.steps.length;        // request/response round-trips (1 with no tools)
result.finishReason;        // why it stopped ("stop" means it finished on its own)
result.usage;               // { inputTokens, outputTokens, totalTokens }

Reading the metadata is just logging the last three of those fields; writing those log lines is your TODO 1, on TripMate’s result rather than this bot’s.

steps is the number to keep an eye on. It’s 1 here: one request, one response. In f4 you give the agent a tool and watch it climb, because the agent has to go back to the model once the tool has run.

Open start/agent.ts. It’s the f1 TripMate agent, and your job is to read the same fields off its result, the metadata logs aren’t written for you.

Run it:

npm run f2

You get the reply, then the list of messages the model produced. Scroll up and you’ll see more above your own logs: the run, drawn as a tree of spans. That’s the trace, and it’s on for every challenge from here.

  1. Log the run metadata (TODO 1). Under the message loop, add a --- run metadata --- section that prints the same three fields you saw on the throwaway bot: result.finishReason, result.steps.length, and result.usage. Write the lines yourself, nothing is pre-commented. Run again and you should see finishReason: stop, steps: 1, and a usage block with the token counts. That’s the simplest possible loop: one request in, one response out.

  2. Watch the tokens move (TODO 2). Every call spends tokens, and what you put in the instructions is part of the bill. Add a brevity rule of your own to TripMate’s instructions — cap the answer to a sentence or so, your wording — and compare the counts to step 1: fewer words out, fewer outputTokens. Now try the opposite: paste a long paragraph of persona into the instructions and watch inputTokens go up instead, because the instructions are sent on every call. Predict the direction before each run.

  3. Read the trace (TODO 3). In another terminal run npm run devtools and open http://127.0.0.1:4445, then run npm run f2 again. The run appears as a tree of spans you can click into, with the model, the prompt, the response, and the token usage on each.

    The same spans also print above your own logs in the console, something like this:

    ✓ ai.generateText            8.57s [ai]
         ai.model.id=granite4.1:3b, ai.response.finishReason=stop,
         ai.usage.inputTokens=29, ai.usage.outputTokens=254, ...
    ✓ ai.generateText.doGenerate 8.56s [ai]

    Each span shows its name, how long it took, and a row of attributes. This is autotel wired into every challenge through instrument.ts: debug: "pretty" for the console, devtools: true for the browser. The Python path shows the same spans through logfire.

  4. Check you’ve got it. You should be able to point at the two spans, read the token counts off them, and say why steps is 1 here and what would make it grow.

Stuck? finish/agent.ts is the canonical version. Read it after you’ve had a real go.

  • usage shows undefined token fields. Some local Ollama builds don’t report every count. steps and finishReason are the load-bearing ones; token counts are a bonus when they’re there.

  • result.response.messages looks like nested objects, not a string. That’s correct. A message’s content is a list of typed parts (text now, tool-call and tool-result later). The shape is the point.

  • No trace prints. Run with npm run f2, which loads the tracing bootstrap via --import ./instrument.ts. f1 ships untraced, so it stays silent.

Why is the answer "just the last message"?

A model returns the next message in a conversation, not a standalone answer. The framework sends a list of messages (your instructions, the user’s prompt), the model appends one (its reply), and result.text is the text of that appended message.

When tools arrive, the model appends a tool-call message, the framework appends a tool-result message, and the model appends another reply. The answer is always the last message in a growing list, and reading that list is how you debug every later challenge.

What exactly is a token?

Models read and write tokens: chunks of text, often a word or a piece of one. “Lisbon” might be one token; “unforgettable” might be three.

usage.inputTokens is roughly how much you sent (instructions plus prompt plus any tool results), and usage.outputTokens is how much it wrote back. Providers bill and rate-limit by token count, which is why “be concise” is both a style choice and a cost choice.

That’s the hood open. Next up is f3, where we make the model return typed data instead of prose, so your code can use the result without parsing it.