Skip to content

Resilience: failures as data

Self-serve track. Not part of the live 90-minute block; do it any time after Foundations. Run with make resilience (reference: make solution-resilience).

Every tool so far has worked. Real APIs go down, and users type places that do not exist. This challenge is about what the agent does when a tool fails.

In a hurry? These three steps are the whole challenge. Everything below is the why and the how.

  1. Run make resilience and watch the run crash on Atlantis, because get_weather raises for an unknown city.
  2. Edit start/agent.py: do TODO 1 (replace the raise in get_weather with a {"error": ...} return carrying a recovery hint), TODO 2 (add a recovery clause to the instructions).
  3. Done when the run no longer crashes, the model names the Atlantis failure, and it offers a real alternative instead of stopping.

You ask TripMate to plan a weekend in Atlantis, which no tool can find, and you make that failure into something the model can handle well.

The one idea here: return failures as data, do not raise them. A returned {"error": "..."} is a normal tool result the model reads and acts on, and the message is one you wrote, so you can include recovery guidance. A raised exception is different: it escapes the tool, agent.run(...) propagates it, and your program stops.

Forget travel. Say a charge_card tool can fail. The instinct is to raise:

async def charge_card(amount: float) -> dict:
    ok = await charge(amount)
    if not ok:
        raise ValueError("charge failed")   # escapes the tool and takes the run down
    return {"ok": True}

Pydantic AI does not swallow a raised exception: it escapes the tool, agent.run(...) propagates it, and your program stops. Return the failure as data instead, with a message and recovery guidance you control:

async def charge_card(amount: float) -> dict:
    ok = await charge(amount)
    if not ok:
        return {"error": "<name what failed, and tell the model what to do next>"}
    return {"ok": True}

The returned version keeps the agent alive and gives the model something to act on, and the wording is yours. That wording is the whole lesson, so it is yours to write, not to copy. Below you do this to TripMate’s get_weather.

Open start/agent.py. get_weather raises for any city it does not know, and the prompt asks about Atlantis, which no tool can find. TODO 1 is yours: replace the raise with a return {"error": ...} carrying a recovery hint. TODO 2: add a clause to the agent’s instructions telling TripMate what to do with an error field.

make resilience
  1. Run it and watch it crash. Run make resilience. get_weather raises for Atlantis, the exception is not caught, and the run falls over: you see “TripMate crashed on a tool error”. One unknown city took the whole plan down, and you had no say in what happened next.

  2. Return the error as data (TODO 1). In get_weather, replace the raise with a return {"error": ...} whose message names what failed and tells the model what to do next (suggest a real city, or continue without the weather). The exact wording is yours to write, and it is the whole point: the model can only act on what your message says. The TODO comment in start/agent.py marks the spot.

  3. Tell the agent what to do with an error field (TODO 2). Add a clause to the instructions, in your own words, telling TripMate what to do when a tool returns an error field: acknowledge the failure plainly, then suggest a real alternative or continue with what worked. Write the sentence yourself; there is a marked spot in the instructions string.

    Run again. The run no longer crashes. The model gets a message you wrote, names the failure (“no weather data for Atlantis, it is not a real destination”), and offers a real alternative. Both tools fail for Atlantis, so there is nothing to confabulate around: the only honest response is to surface the failure.

  4. Run the bare-vs-hinted poke. Temporarily change your error message to just "No weather data." Predict how TripMate recovers, then run. With nothing to act on, the recovery goes thin: it apologises and stalls. Put your hinted message back and run again. Same failure, two recoveries. The model can only act on what your message says, which is the whole reason you return the error instead of raising.

  5. Verify what you’ve got. The first run crashes on the raised error. After your changes, get_weather returns {"error": ...} with a recovery hint, the instructions carry your recovery clause, and the agent names the Atlantis failure and offers a real alternative without crashing. You should be able to say why returning an error beats raising one, and when ModelRetry is the right call instead.

  • The recovery is thin even after you return the error. Put more in the error message. The model can only act on what the message says; a bare "error: failed" gives it nothing to work with.
  • A small model still pushes ahead and invents. granite4.1:3b sometimes does, if any tool succeeded and gave it material. Both tools fail for Atlantis, so there’s nothing to confabulate around: the only honest response is to surface the failure.
  • Reaching for ModelRetry on a permanent failure. It retries the same doomed call until the budget runs out, then pydantic-ai raises UnexpectedModelBehavior. Use it only when another attempt could actually work; return errors as data for failures that never will.
What about ModelRetry?

Pydantic AI has ModelRetry: raise it from a tool and the model gets your message and tries the call again. It is the right tool for a transient or fixable failure: a validation miss the model can correct, a rate limit worth retrying, “you passed an unknown airport code, try the IATA code.” It is the wrong tool for a permanent failure like Atlantis: the model retries the same doomed call until it exhausts the retry budget, then Pydantic AI raises UnexpectedModelBehavior and the run crashes anyway. For a failure that will never succeed, return it as data so the model adapts instead of retrying. Reach for ModelRetry only when another attempt could actually work.

Why make both tools fail for Atlantis?

If the flight lookup had succeeded for Atlantis, the model would have a real price to build a pitch around and would bury the weather failure under a confident itinerary, especially a small model. Making every tool fail for the fake destination removes the material to confabulate with, so the only honest response is to surface the failure.

These three self-serve tracks sit outside the numbered path. When you’re done here, head back to the main tracks, foundations f1–f5 and patterns p1–p7.