r1: Retrieval
You start the RAG track here. Until now the model has answered from what it absorbed in training, which is fine for “what is the capital of Portugal” and useless for “which of our destinations suits this traveller”. Retrieval-Augmented Generation fixes that. You embed your own documents, embed the question, rank your documents by how close they are, and hand the closest few to the model. The model then answers from data you chose.
The new idea is the ranking. Everything around it you have already met: the blurbs are
data, and you expose the search with @agent.tool_plain, the same shape as f4.
Quick path
Section titled “Quick path”In a hurry? These three steps are the whole challenge. Everything below is the why and the how.
- Run
make r1and read the retrieval block: it returns Bali, Cancún and the Swiss Alps for “warm beach on a budget”, becausesearchignores the query and returns the first three. - Edit the body of
search: embed the query (TODO 1), score every destination by cosine similarity (TODO 2), sort and take the topk(TODO 3). - Done when the retrieval block lists three real warm-beach destinations in descending score order, and TripMate recommends one of them.
A destination’s blurb becomes a vector. The query becomes a vector in the same space. “Close together” means “about the same thing”, and cosine similarity is how you measure it. Rank by that, keep the top few, and you have retrieval.
The top-K cutoff is the part people underestimate. The model can only recommend what you put in front of it. Retrieve the wrong three and a confident, well-written, wrong answer is exactly what you get.
Mental model
Section titled “Mental model”Each blurb is embedded once at startup. The query is embedded per call. Similarity ranks them; the top K is all the model ever sees.
The mechanic, in another domain
Section titled “The mechanic, in another domain”Forget the catalogue. Say you want the FAQ entry that best answers a question. Embed the answers once, embed the question, score each by cosine similarity, and keep the top few:
embed_documents embeds the answers in one batch, embed_query embeds a single query, and cosine (the helper at the top of the file) scores two vectors. The query and the docs must use the same embedder, or the scores are noise. That embed → score → sort → slice is retrieval; below you write it for TripMate’s catalogue.
The setup
Section titled “The setup”Open start/agent.py. The DESTINATIONS catalogue, index_destinations (it embeds every blurb once at startup into doc_embeddings), and the search_destinations tool that wraps search are all provided. The blank is the body of search: it returns the first three and ignores the query. The three ranking lines (TODO 1, 2, 3) are yours.
Run it
Section titled “Run it”You get two blocks. The retrieval block prints the “top” three for “warm beach on a
budget”, and the Swiss Alps is in there, because nothing compares the query
to the blurbs yet. The agent block then recommends from that broken set. Fix search and
both blocks come good at once.
The catalogue’s embeddings are pre-generated and committed (
embeddings.embeddinggemma.jsonin this folder), so the first run is instant. Delete that file to regenerate them (a few seconds, once, while Ollama loadsembeddinggemma); a different embedder writes its own. See TROUBLESHOOTING.md.
Build it
Section titled “Build it”-
Run it and read the gap. Run
make r1. The scores are all0.000and the order is catalogue order, so “warm beach on a budget” returns the Swiss Alps. That is what “ignores the query” looks like. -
Embed the query (TODO 1). Put the query into the same vector space as the blurbs with one
embedder.embed_query(query)call, then read the embedding off the result. Sameembedderas the blurbs. Embeddings are only comparable within one model. -
Score every destination (TODO 2). Build a list, one entry per destination, each carrying its similarity to the query.
doc_embeddings[i]is the vector forDESTINATIONS[i], andcosine(...)is already waiting for you at the top of the file. -
Rank and cut (TODO 3). Sort highest-first and return the top
k. The rest never reaches the model. -
Run it again. The retrieval block should now read something like
0.55 Bali,0.48 Cancún,0.48 Lisbon, real warm-beach options in descending order, and TripMate should recommend one of them and say why. The Swiss Alps drops out because skiing and fondue are not close to “warm beach”. -
Poke the cutoff. Change
search(query, 3)to1, then to6. Atk = 1the model sees a single option and has no choice; atk = 6it sees half the catalogue, including things that do not fit, and the recommendation gets vaguer. Retrieval quality is mostly about giving the model enough and no more. -
Check you’ve got it. You should be able to say, in one sentence, why the same query returned the Swiss Alps before and Bali after, and point at the line that made the difference. Look at the trace too: you will see embedding spans for the query alongside the agent run and the
search_destinationstool call.
Stuck? finish/agent.py is the canonical version. Read it after you’ve had a real go.
- Mixing embedding models. Vectors are only comparable if they came from the same model. Embed your docs with one and your query with another and the scores are noise.
- Trusting the top result. The closest match is still only the closest of what you have. If nothing fits, retrieval hands over the least bad option and the model presents it confidently. Watch for that when you poke the query.
- Embedding too much at once. One blurb per vector is fine here. When a document is long, a single vector becomes a blurry average and specific questions stop matching. That is exactly what r2 is about.
A couple of things worth knowing
Section titled “A couple of things worth knowing”Embeddings, in one paragraph
An embedding model turns text into a list of numbers (a vector) so that text about similar things lands in similar places. “Warm beach on a budget” and “Caribbean white-sand beaches, great-value resorts” end up close; “snow-sure skiing, fondue” ends up far. Cosine similarity measures the angle between two vectors, so it ignores length and asks “do these point the same way”. You never read the numbers yourself; you only ever compare them.
Why retrieval is a tool
Nothing new happened at the agent level. You wrote a @agent.tool_plain function with a
clear docstring (the model reads that docstring as the tool’s description) and it returned
some data. The model decided to call it. The only difference from f4 is that the tool does
a similarity search instead of returning a mock. That is the whole trick: RAG is retrieval
inside the tool loop. If the model never calls the tool, the
problem is almost always the docstring or the instructions, not the embeddings.
embed_query vs embed_documents
Pydantic AI’s Embedder gives you both. They can produce slightly different vectors for
the same text, because some embedding models are trained to put a question and the passage
that answers it close together, even though they are worded differently. Use
embed_documents for the things you store and embed_query for the thing you search with.
Here the difference is small, but using the right one is a free habit worth keeping.
Index once, query many
index_destinations runs once at startup and embeds all eight blurbs. search only embeds
the query. That split matters: embedding documents is the slow, expensive part, so you do
it ahead of time and cache the vectors. Here that cache is a JSON file in the challenge
folder, pre-generated and committed so the first run skips the embed; in production it is a
vector database. The shape is identical, only the store changes.
Next up is r2, where the documents get long. One embedding for a whole multi-paragraph guide is too blunt: you chunk the guide into passages, embed those, and retrieve the paragraph that answers the question.