Skip to content

Output guardrails (middleware)

Self-serve track. The output half of f5’s guardrail. Not part of the live 90-minute block; do it any time after Foundations f5. Run with npm run guardrails-middleware (reference: npm run solution:guardrails-middleware).

In f5 you built an input guardrail: a cheap check that refused off-topic or unsafe requests before the model ran. This is the other half. An output guardrail inspects what the model generated and cleans it before the reply leaves your system, here by redacting personal data from the text.

In a hurry? These three steps are the whole challenge. Everything below is the why and the how.

  1. Run npm run guardrails-middleware and watch TripMate read the traveller’s email and phone number straight back into the reply.
  2. Edit start/agent.ts: write a wrapGenerate middleware that redacts email and phone from the text, wrap model with wrapLanguageModel, and give the wrapped model to the agent.
  3. Done when the same reply comes back with <redacted email> and <redacted phone> in place of the real values, and you did not change the agent or its prompt to get there.
agent.generate()  ->  [ model produces text ]  ->  [ wrapGenerate redacts ]  ->  reply
                          (the real call)            (middleware, on the way out)

f5 gated the input; this filters the output. The agent does not know the filter is there: you wrap the model once and hand it over.

Middleware is the AI SDK’s language-model-agnostic way to add cross-cutting behavior: guardrails, logging, caching, RAG. You build it once against the model interface and it works with any provider. Here’s the entire shape on a throwaway middleware that just SHOUTS every reply, nothing to do with redaction:

const shout: LanguageModelMiddleware = {
  specificationVersion: "v3",                       // required by the middleware type
  wrapGenerate: async ({ doGenerate }) => {
    const result = await doGenerate();              // run the real model call
    return {
      ...result,
      content: result.content.map((part) =>         // v6: text lives in content parts
        part.type === "text" ? { ...part, text: part.text.toUpperCase() } : part,
      ),
    };
  },
};

const loudModel = wrapLanguageModel({ model, middleware: shout });   // drop-in replacement

Two moves to carry over to the redaction guardrail: a wrapGenerate that awaits doGenerate() and maps the text parts of result.content (passing non-text parts through), and a wrapLanguageModel that wraps your model so the agent never knows the filter is there. Your version is the same shape with redact(part.text) in place of .toUpperCase(), plus the redact helper you write.

npm run guardrails-middleware
  1. Run it and watch the leak. Run npm run guardrails-middleware. The prompt hands TripMate an email and a phone number and asks it to confirm them back. With the raw model, it does exactly that, and the contact details land in the reply.

  2. Write the middleware (TODO). Add a redact helper (two regexes are enough: one for email, one for phone) and a piiGuardrail middleware whose wrapGenerate awaits doGenerate() and returns the result with its text parts redacted. Import wrapLanguageModel and the LanguageModelMiddleware type from ai.

  3. Wrap the model and hand it over. Build safeModel with wrapLanguageModel and pass it as the agent’s model. Run again: the reply now shows <redacted email> and <redacted phone>, and you did not touch the agent’s instructions or the prompt to get there. That is the point of middleware: the guardrail is a property of the model, not something every caller has to remember.

Stuck? finish/agent.ts is the canonical version. Read it after you’ve had a real go.

  • Nothing is redacted. You wrapped the model but left the agent pointing at the raw model, or you returned the original result instead of the mapped copy. The middleware only matters if the wrapped model is the one the agent uses.

  • You looked for result.text. Older examples destructure const { text } = await doGenerate(). In AI SDK v6 the text is in result.content as { type: "text", text } parts; there is no top-level text on the generate result.

Why middleware instead of editing the reply after `.generate()`?

You could redact result.text yourself at the call site, and for one script that is fine. Middleware wins when the rule has to hold everywhere: wrap the model once and every agent and every call that uses it inherits the filter, with no caller able to forget it. It is also reusable and provider-agnostic, so the same guardrail works whether you are on Ollama or Google. The same seam is where logging, caching, and RAG belong for the same reason.

Why is streaming harder?

wrapGenerate sees the whole reply at once, so redacting it is a string replace. The streaming hook, wrapStream, sees the text in pieces as it is produced, and a value you want to catch (an email, a card number) can be split across chunks. Doing it properly means buffering and matching across the stream, which is why output guardrails are easiest to reason about on the non-streaming path, and why f5 chose an input gate over filtering a stream.

That is the output half of guardrails. With f5’s input gate in front and a filter like this behind, you have both sides of the pattern a production agent usually wants.