Skip to content

f6: Descriptions are the interface

In f4 the agent had one tool. Real agents have several, and the model must pick the right one per question. It picks on the one thing it can read about each tool: its description. It never sees your execute. So the description is the interface it routes on, and writing a good one is the whole job here.

In a hurry? These three steps are the whole challenge. Everything below is the why and the how.

  1. Run npm run f6 with QUERY_TO_RUN = 1 (a packing question). getWeather has no description, so routing flounders.
  2. Edit start/agent.ts: TODO 1 (write getWeather’s description so query 1 lands on it), TODO 2 (write getFlights’s so query 2 does), then run queries 1, 2, 3 and check each routes right, or to none.
  3. Done when query 1 reaches getWeather, query 2 getFlights, query 3 none, you wrote zero routing logic, and you can say why a vague description breaks it.
prompt  ->  model reads each tool's DESCRIPTION  ->  best match, or none

A clear description owns a kind of question; a vague one owns nothing. There is no if.

A docs assistant with two tools:

searchDocs:  "Search the API reference. Use for how-to, syntax, and which-function questions."
runSnippet:  "Run a code snippet and return its output. Use for what-does-this-print and debugging questions."

“How do I format a date?” routes to searchDocs; “why does this print undefined?” to runSnippet. You wrote no if: each description claims a kind of question and the model matches it. Change searchDocs to "Gets data." and the same question has nothing to match, so it mis-routes. A description is a claim about which questions a tool answers. Below you write those claims for TripMate.

Open start/agent.ts. getCurrentTime is described for you, as the shape to copy:

description: "Get the current local time and part of day for a city. Use for timing and meal questions."

What it returns, and when to use it. getWeather and getFlights are blank.

ToolYour description has to win…
getWeatherquery 1, the packing question
getFlightsquery 2, the flight-price question
getCurrentTimedone (the timing question)
  1. Run it blank. With QUERY_TO_RUN = 1, run npm run f6. With getWeather undescribed, the packing question has nothing to land on; watch it misfire.
  2. Write getWeather (TODO 1). In your own words, name what it returns and the questions it answers. Predict, run query 1. Lands elsewhere? Your wording overlaps another tool; tighten it.
  3. Write getFlights (TODO 2). Set QUERY_TO_RUN = 2, write it, predict, run. Expect getFlights.
  4. Run query 3. Set QUERY_TO_RUN = 3 (“capital of Portugal?”). Expect no tool: the model already knows this.
  5. Break it on purpose. Set getWeather’s description to "Gets data.", re-run query 1. On Gemini routing clearly breaks; on the granite default it is muddier (small models lean on the tool name and prompt too). Only the words changed.
  6. Check you’ve got it. Query 1 to getWeather, query 2 to getFlights, query 3 to none, and you can say why you wrote no routing code.

Stuck? finish/agent.ts routes cleanly. Read it after you write your own.

  • Wrong tool fires. Two descriptions overlap; tighten each until it owns one kind of question.
  • Query 3 still calls a tool. Small models lean toward using an available tool; re-run, the trend is none.
  • Vague description still routes on granite. Expected. The contrast is sharp on Gemini; swap shared/model.ts to see it.
How is this different from agentic (p5)?

Here the model picks one tool for one question: routing. In p5 it calls several tools for one request, in order, feeding each result into the next. Both run on descriptions like the ones you just wrote.

You can now call an agent, read its loop and cost, get typed output, give it a tool, and shape which tool it reaches for by what you write. That is the augmented LLM.

One foundation left. f7 is testing: prove this guardrail blocks the wrong question and lets the right one through, using a fake agent you inject so the check is instant, deterministic, and never makes a real call.