Agent Middleware in Microsoft Agent Framework 1.0
A familiar pipeline pattern applied to AI agents Covers all three middleware types, registration scopes, termination , result override, and when to use each Not a New Idea If you have used ASP.NET Core or Express.js , you already understand the core concept. Both frameworks let you register a chain of functions around every request. Each function receives a context and a next() delegate . Calling next() continues the chain. Not calling it short circuits it. That is the pipeline pattern a clean way to apply cross cutting concerns like logging, authentication, and error handling without touching any business logic. Microsoft’s Agent Framework applies this exact pattern to AI agents. The next() delegate becomes call_next(), the context object holds the agent’s conversation instead of an HTTP
A familiar pipeline pattern applied to AI agents
Covers all three middleware types, registration scopes, termination, result override, and when to use each
Not a New Idea
If you have used ASP.NET Core or Express.js, you already understand the core concept. Both frameworks let you register a chain of functions around every request. Each function receives a context and a next() delegate. Calling next() continues the chain. Not calling it short circuits it. That is the pipeline pattern a clean way to apply cross cutting concerns like logging, authentication, and error handling without touching any business logic.
Microsoft’s Agent Framework applies this exact pattern to AI agents. The next() delegate becomes call_next(), the context object holds the agent’s conversation instead of an HTTP request, and the pipeline wraps an AI reasoning turn instead of a web request. If you know app.Use() or app.use(), you already know the shape of what follows.
What is new, and worth understanding deeply, is that an agent turn is not a single request/response cycle. It is a multi step reasoning loop, and Agent Framework exposes three distinct interception points within it. The rest of this post covers all three types, how they differ, when to use each, and how they come together in a real SQL agent example.
Middleware
The Agent Framework supports three types of middleware, each intercepting a different layer of execution:
-
Agent middleware wraps agent runs, giving you access to inputs, outputs, and overall control flow.
-
Function middleware wraps individual tool calls, enabling input validation, result transformation, and execution control.
-
Chat middleware wraps the underlying requests sent to AI models, exposing raw messages, options, and responses.
All three types support both function based and class based implementations.
Chaining
When multiple middleware of the same type are registered, they execute as a chain each middleware calls call_next() to hand off control to the next one in line.
Rather than passing updated values into call_next() as arguments, middleware mutates the shared context object directly. This means any changes you make to the context before calling call_next() are automatically visible to downstream middleware, with no need to thread values through the call explicitly.
Execution Order
Agent level middleware always wraps run level middleware. Given agent middleware [A1, A2] and run middleware [R1, R2], the execution order is:
A1 → A2 → R1 → R2 → Agent → R2 → R1 → A2 → A1
Enter fullscreen mode
Exit fullscreen mode
Function and chat middleware follow the same wrapping principle, applied at the time of each tool call or chat request respectively.
Why we need it
The biggest value is not convenience; it is correctness and consistency.
Without middleware, teams usually end up in one or both of these patterns:
Pattern 1: policy hidden in prompts
Example instruction:
"Never run destructive SQL. Never send data to personal email."
This is useful guidance, but it is still model behavior, not a hard gate. As prompts get long, tools increase, and edge cases appear, this policy can become inconsistent. It is also hard to audit after the fact.
Pattern 2: policy duplicated in each tool
def run_sql(query: str) -> str: if "drop" in query.lower(): return "blocked" ...def run_sql(query: str) -> str: if "drop" in query.lower(): return "blocked" ...def export_data(target: str) -> str: if "gmail.com" in target.lower(): return "blocked" ...
def quote_inventory_line(quantity: int) -> str: if quantity > 10000: return "blocked" ...`
Enter fullscreen mode
Exit fullscreen mode
This looks safe, but it creates:
-
duplicated logic
-
inconsistent rules across tools
-
expensive updates when policy changes
Middleware fixes both
With middleware, concerns live at the right boundary:
-
run level checks in Agent middleware
-
per tool checks in Function middleware
-
model call telemetry/metadata in Chat middleware
Result:
-
cleaner tools
-
stronger guardrails
-
easier tests
-
better observability
1. Agent Middleware-outermost layer
Agent middleware is the outermost layer of the pipeline. It fires once per turn before any LLM call is made and after the final reply or response is produced making it the right place for concerns that span the entire turn: input validation, security screening, audit logging, and output transformation.
Implementation Styles & Chaining
Agent middleware supports both class based and function based implementations both are fully equivalent, and the choice comes down to whether you need instance state or prefer a lighter syntax. When multiple middleware components are registered, they form a chain. Each component is responsible for calling call_next() to pass control to the next layer; omitting this call short-circuits the pipeline, preventing any downstream middleware or the LLM from running.
Note that call_next() takes no arguments. Instead of passing updated values explicitly, middleware mutates the shared AgentContext object directly — any changes made before await call_next() are automatically visible to everything further down the chain.
Class-Based Implementation
Subclass AgentMiddleware and override process(). The example below shows SecurityAgentMiddleware It inspects the latest user message and short-circuits the pipeline if it detects a threat the LLM is never invoked for blocked requests.
class SecurityAgentMiddleware(AgentMiddleware): """Agent-level guard: blocks risky **user chat text** before the model runs.class SecurityAgentMiddleware(AgentMiddleware): """Agent-level guard: blocks risky **user chat text** before the model runs.Inspects context.messages[-1] (latest user turn). If :func:_unsafe_input_reason
returns a reason, sets context.result to a canned assistant reply and does not
call call_next(), so the LLM and tools are skipped for that turn.
"""_
async def process( self, context: AgentContext, call_next: Callable[[], Awaitable[None]], ) -> None:
Only the latest user utterance is checked (typical for a single-turn REPL).
last_message = context.messages[-1] if context.messages else None if last_message and last_message.text: query = last_message.text reason = unsafe_input_reason(query) if reason: print(f"[SecurityAgentMiddleware] Security Warning: {reason}; blocking request.")
Short-circuit: set the assistant reply here; do NOT call call_next() → no LLM, no tools.
context.result = AgentResponse( messages=[ Message( "assistant", [f"Request blocked: {reason}."], ) ] ) return
print("[SecurityAgentMiddleware] Security check passed.")
Continue pipeline: model + optional run_sql; function middleware runs inside tool path.
await call_next()
here is the unsafe_input_reason function & For brevity, I’ve omitted the full code.”
def unsafe_input_reason(query: str) -> str | None:
"""Classify why a user message should be blocked, or None if it may proceed.
Checks run in order: injection-style patterns first, then destructive natural language. """
Order matters: catch obvious SQL fragments before broader NL patterns.
if _looks_like_dangerous_sql(query): return "injection-style or suspicious SQL fragment in your message" if _looks_like_destructive_database_intent(query): return "destructive database request (e.g. delete/drop/truncate)" return None`
Enter fullscreen mode
Exit fullscreen mode
Function-Based and Decorator-Based Styles
Agent Framework also supports function-based and decorator-based implementations. All three styles are equivalent; choose based on whether you need state or explicit type annotations.
Function-based
async def logging_agent_middleware(
context: AgentContext,
next: Callable[[AgentContext], Awaitable[None]],
) -> None:
print("[Agent] Turn starting")
await next(context)
print("[Agent] Turn completed")`
Enter fullscreen mode
Exit fullscreen mode
Decorator-based (no type annotation required)
@agent_middleware
async def simple_agent_middleware(context, next):
print("Before agent execution")
await next(context)
print("After agent execution")`
Enter fullscreen mode
Exit fullscreen mode
Registering Middleware
Middleware is registered when constructing the agent. Pass a list to the middleware argument different middleware types can be mixed in the same list and the framework routes each to the correct pipeline layer automatically:
FOUNDRY_PROJECT_ENDPOINT = "https://sreeniagent.services.ai.azure.com/api/projects/sreeni_foundry" FOUNDRY_MODEL = "gpt-4.1"FOUNDRY_PROJECT_ENDPOINT = "https://sreeniagent.services.ai.azure.com/api/projects/sreeni_foundry" FOUNDRY_MODEL = "gpt-4.1"async with (
AzureCliCredential() as credential,
Agent(
client=FoundryChatClient(
credential=credential,
project_endpoint=FOUNDRY_PROJECT_ENDPOINT, # Your Microsoft Foundry project URL
model=FOUNDRY_MODEL, # The model you deployed
),
name="Sreeni-SqlAssistant",
instructions=(
"You help users query a small demo database. "
"The only table is customers with columns id, name, city. "
"Always use the run_sql tool with a proper SELECT; explain results briefly."
),
tools=run_sql,
Agent middleware wraps the turn; function middleware wraps each tool call
middleware=[SecurityAgentMiddleware(), LoggingFunctionMiddleware()], ) as agent, ):`
Enter fullscreen mode
Exit fullscreen mode
When to Use Agent Middleware
Agent middleware is the right choice for any concern that applies to the turn as a whole, rather than to a specific tool call or model request.
2.FunctionMiddleware- The ToolCall Layer
FunctionMiddleware fires inside the agent turn, but only when the LLM decides to invoke a tool. A single agent turn can trigger multiple tool calls, and FunctionMiddleware wraps each one independently. This makes it the right place for concerns that are specific to tool execution: timing, input validation, result transformation, and tool call auditing.
The FunctionInvocationContext Object
Each FunctionMiddleware component receives a FunctionInvocationContext, which is scoped to a single tool invocation:
When to Use FunctionMiddleware
Use it for concerns specific to tool execution the execution timing and performance monitoring, validating or sanitising tool arguments before they run, capping the number of times a tool may be called in one turn, transforming tool results before the LLM sees them, or auditing exactly which tools were called and with what arguments.
Terminating the Function-Calling Loop
Setting context.terminate = True inside FunctionMiddleware does something powerful: it stops the LLM’s function calling loop entirely. The LLM will not receive the tool result and will not make any further tool calls in this turn. This is useful for enforcing tool call budgets or stopping a loop that is going in an undesirable direction.
@function_middleware
async def budget_middleware(context, next):
if context.function.name == "run_sql":
Allow at most one SQL query per turn
call_count = context.metadata.get("sql_calls", 0)
if call_count >= 1:
context.result = "Query limit reached for this turn."
context.terminate = True # stop the LLM tool-calling loop
return
context.metadata["sql_calls"] = call_count + 1
await next(context)`
Enter fullscreen mode
Exit fullscreen mode
Warning: Termination and Chat History Warning: Termination and Chat HistoryTerminating the function calling loop can leave the chat history in an inconsistent state a tool-call message with no corresponding tool result. This may cause errors if the same history is used in subsequent agent runs. Use termination carefully and consider clearing or repairing the history afterward.
3. ChatMiddleware —The LLM Call Layer
ChatMiddleware is the deepest layer. It wraps the actual inference call sent to the underlying language model the raw list of messages, the model options, and the response that comes back. This layer fires for every call to the LLM within a turn, which can be more than one if tools are used.
The ChatContext Object
Each ChatMiddleware component receives a ChatContext.
Function-Based Example
python
async def logging_chat_middleware(
context: ChatContext,
next: Callable[[ChatContext], Awaitable[None]],
) -> None:
print(f"[Chat] Sending {len(context.messages)} messages to model")
await next(context)
print("[Chat] Model response received")
Because ChatMiddleware sees the exact message list going to the model, it can be used to inject system instructions, strip sensitive content, enforce token budgets, or even substitute a cached response all without the AgentMiddleware or FunctionMiddleware layers knowing anything changed. Because ChatMiddleware sees the exact message list going to the model, it can be used to inject system instructions, strip sensitive content, enforce token budgets, or even substitute a cached response all without the AgentMiddleware or FunctionMiddleware layers knowing anything changed.When to Use ChatMiddleware
Use it when you need access to the raw LLM call: injecting or modifying system level instructions per call, redacting PII from messages before they leave your infrastructure, enforcing token count limits, caching repeated inference calls, or monitoring every model request for compliance purposes.
Registration: Agent Level vs. Run Level (run scope)
Microsoft Agent Framework supports two scopes for registering middleware. Understanding the difference is important for designing flexible agent systems.
Agent Level Middleware
Middleware passed in the middleware=[...] list when constructing the Agent applies to every single call to agent.run() for the lifetime of that agent. This is where you put policies that should always be enforced: security guards, mandatory audit logging, content filters.
Run Level Middleware
You can also pass middleware directly to a single agent.run() call. This middleware applies only to that one invocation and is discarded afterward. It is useful for per request customisation: adding a trace ID for a specific call, applying extra validation for a sensitive operation, or attaching a debug logger without affecting every other turn.
Choosing the Right Middleware Type
With three types available, the choice usually comes down to what you need to see and at what granularity.
Conclusion
Microsoft Agent Framework’s middleware brings the same pipeline contract you know from ASP.NET Core and Express ordered components, a context object, and a call_next() delegate into the world of AI agents. The structural difference is that an agent turn is not a single request/response cycle but a multi-step reasoning loop, and Agent Framework exposes three separate interception points within it.
AgentMiddleware is the right home for turn level concerns: security screening, content policy, and audit logging.
FunctionMiddleware is the right home for tool level concerns: execution timing, argument validation, and tool call budgets.
ChatMiddleware is the right home for model level concerns: raw message inspection, token enforcement, and caching.
Thanks Sreeni Ramadorai
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modellanguage modelavailable
7 CVEs in 48 Hours: How PraisonAI Got Completely Owned — And What Every Agent Framework Should Learn
PraisonAI is a popular multi-agent Python framework supporting 100+ LLMs. On April 3, 2026, seven CVEs dropped simultaneously. Together they enable complete system compromise from zero authentication to arbitrary code execution. I spent the day analyzing each vulnerability. Here is what I found, why it matters, and the patterns every agent framework developer should audit for immediately. The Sandbox Bypass (CVE-2026-34938, CVSS 10.0) This is the most technically interesting attack I have seen this year. PraisonAI's execute_code() function runs a sandbox with three protection layers. The innermost wrapper, _safe_getattr , calls startswith() on incoming arguments to check for dangerous imports like os , subprocess , and sys . The attack: create a Python class that inherits from str and over
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.





Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!