Models model language model available update service assistant

Agent Middleware in Microsoft Agent Framework 1.0

DEV Communityby Seenivasa RamaduraiApril 4, 202611 min read1 views

A familiar pipeline pattern applied to AI agents Covers all three middleware types, registration scopes, termination , result override, and when to use each Not a New Idea If you have used ASP.NET Core or Express.js , you already understand the core concept. Both frameworks let you register a chain of functions around every request. Each function receives a context and a next() delegate . Calling next() continues the chain. Not calling it short circuits it. That is the pipeline pattern a clean way to apply cross cutting concerns like logging, authentication, and error handling without touching any business logic. Microsoft’s Agent Framework applies this exact pattern to AI agents. The next() delegate becomes call_next(), the context object holds the agent’s conversation instead of an HTTP

A familiar pipeline pattern applied to AI agents

Covers all three middleware types, registration scopes, termination, result override, and when to use each

Not a New Idea

If you have used ASP.NET Core or Express.js, you already understand the core concept. Both frameworks let you register a chain of functions around every request. Each function receives a context and a next() delegate. Calling next() continues the chain. Not calling it short circuits it. That is the pipeline pattern a clean way to apply cross cutting concerns like logging, authentication, and error handling without touching any business logic.

Microsoft’s Agent Framework applies this exact pattern to AI agents. The next() delegate becomes call_next(), the context object holds the agent’s conversation instead of an HTTP request, and the pipeline wraps an AI reasoning turn instead of a web request. If you know app.Use() or app.use(), you already know the shape of what follows.

What is new, and worth understanding deeply, is that an agent turn is not a single request/response cycle. It is a multi step reasoning loop, and Agent Framework exposes three distinct interception points within it. The rest of this post covers all three types, how they differ, when to use each, and how they come together in a real SQL agent example.

Middleware

The Agent Framework supports three types of middleware, each intercepting a different layer of execution:

Agent middleware wraps agent runs, giving you access to inputs, outputs, and overall control flow.
Function middleware wraps individual tool calls, enabling input validation, result transformation, and execution control.
Chat middleware wraps the underlying requests sent to AI models, exposing raw messages, options, and responses.

All three types support both function based and class based implementations.

Chaining

When multiple middleware of the same type are registered, they execute as a chain each middleware calls call_next() to hand off control to the next one in line.

Rather than passing updated values into call_next() as arguments, middleware mutates the shared context object directly. This means any changes you make to the context before calling call_next() are automatically visible to downstream middleware, with no need to thread values through the call explicitly.

Execution Order

Agent level middleware always wraps run level middleware. Given agent middleware [A1, A2] and run middleware [R1, R2], the execution order is:

A1 → A2 → R1 → R2 → Agent → R2 → R1 → A2 → A1

Enter fullscreen mode

Exit fullscreen mode

Function and chat middleware follow the same wrapping principle, applied at the time of each tool call or chat request respectively.

Why we need it

The biggest value is not convenience; it is correctness and consistency.

Without middleware, teams usually end up in one or both of these patterns:

Pattern 1: policy hidden in prompts

Example instruction:

"Never run destructive SQL. Never send data to personal email."

This is useful guidance, but it is still model behavior, not a hard gate. As prompts get long, tools increase, and edge cases appear, this policy can become inconsistent. It is also hard to audit after the fact.

Pattern 2: policy duplicated in each tool

def run_sql(query: str) -> str:  if "drop" in query.lower():  return "blocked"  ...

def run_sql(query: str) -> str:  if "drop" in query.lower():  return "blocked"  ...

def export_data(target: str) -> str: if "gmail.com" in target.lower(): return "blocked" ...

def quote_inventory_line(quantity: int) -> str: if quantity > 10000: return "blocked" ...`

Enter fullscreen mode

Exit fullscreen mode

This looks safe, but it creates:

duplicated logic
inconsistent rules across tools
expensive updates when policy changes

Middleware fixes both

With middleware, concerns live at the right boundary:

run level checks in Agent middleware
per tool checks in Function middleware
model call telemetry/metadata in Chat middleware

Result:

cleaner tools
stronger guardrails
easier tests
better observability

1. Agent Middleware-outermost layer

Agent middleware is the outermost layer of the pipeline. It fires once per turn before any LLM call is made and after the final reply or response is produced making it the right place for concerns that span the entire turn: input validation, security screening, audit logging, and output transformation.

Implementation Styles & Chaining

Agent middleware supports both class based and function based implementations both are fully equivalent, and the choice comes down to whether you need instance state or prefer a lighter syntax. When multiple middleware components are registered, they form a chain. Each component is responsible for calling call_next() to pass control to the next layer; omitting this call short-circuits the pipeline, preventing any downstream middleware or the LLM from running.

Note that call_next() takes no arguments. Instead of passing updated values explicitly, middleware mutates the shared AgentContext object directly — any changes made before await call_next() are automatically visible to everything further down the chain.

Class-Based Implementation

Subclass AgentMiddleware and override process(). The example below shows SecurityAgentMiddleware It inspects the latest user message and short-circuits the pipeline if it detects a threat the LLM is never invoked for blocked requests.

class SecurityAgentMiddleware(AgentMiddleware):  """Agent-level guard: blocks risky **user chat text** before the model runs.

class SecurityAgentMiddleware(AgentMiddleware):  """Agent-level guard: blocks risky **user chat text** before the model runs.

Inspects context.messages[-1] (latest user turn). If :func:_unsafe_input_reason returns a reason, sets context.result to a canned assistant reply and does not call call_next(), so the LLM and tools are skipped for that turn. """_

async def process( self, context: AgentContext, call_next: Callable[[], Awaitable[None]], ) -> None:

Only the latest user utterance is checked (typical for a single-turn REPL).

last_message = context.messages[-1] if context.messages else None if last_message and last_message.text: query = last_message.text reason = unsafe_input_reason(query) if reason: print(f"[SecurityAgentMiddleware] Security Warning: {reason}; blocking request.")

Short-circuit: set the assistant reply here; do NOT call call_next() → no LLM, no tools.

context.result = AgentResponse( messages=[ Message( "assistant", [f"Request blocked: {reason}."], ) ] ) return

print("[SecurityAgentMiddleware] Security check passed.")

Continue pipeline: model + optional run_sql; function middleware runs inside tool path.

await call_next()

here is the unsafe_input_reason function & For brevity, I’ve omitted the full code.”

def unsafe_input_reason(query: str) -> str | None: """Classify why a user message should be blocked, or None if it may proceed.

Checks run in order: injection-style patterns first, then destructive natural language. """

Order matters: catch obvious SQL fragments before broader NL patterns.

if _looks_like_dangerous_sql(query): return "injection-style or suspicious SQL fragment in your message" if _looks_like_destructive_database_intent(query): return "destructive database request (e.g. delete/drop/truncate)" return None`

Enter fullscreen mode

Exit fullscreen mode

Function-Based and Decorator-Based Styles

Agent Framework also supports function-based and decorator-based implementations. All three styles are equivalent; choose based on whether you need state or explicit type annotations.

Function-based

async def logging_agent_middleware(

context: AgentContext,

next: Callable[[AgentContext], Awaitable[None]],

) -> None:

print("[Agent] Turn starting")

await next(context)

print("[Agent] Turn completed")`

Enter fullscreen mode

Exit fullscreen mode

Decorator-based (no type annotation required)

@agent_middleware

async def simple_agent_middleware(context, next):

print("Before agent execution")

await next(context)

print("After agent execution")`

Enter fullscreen mode

Exit fullscreen mode

Registering Middleware

Middleware is registered when constructing the agent. Pass a list to the middleware argument different middleware types can be mixed in the same list and the framework routes each to the correct pipeline layer automatically:

FOUNDRY_PROJECT_ENDPOINT = "https://sreeniagent.services.ai.azure.com/api/projects/sreeni_foundry" FOUNDRY_MODEL = "gpt-4.1"

FOUNDRY_PROJECT_ENDPOINT = "https://sreeniagent.services.ai.azure.com/api/projects/sreeni_foundry" FOUNDRY_MODEL = "gpt-4.1"

async with ( AzureCliCredential() as credential, Agent( client=FoundryChatClient( credential=credential, project_endpoint=FOUNDRY_PROJECT_ENDPOINT, # Your Microsoft Foundry project URL model=FOUNDRY_MODEL, # The model you deployed ), name="Sreeni-SqlAssistant", instructions=( "You help users query a small demo database. " "The only table is customers with columns id, name, city. " "Always use the run_sql tool with a proper SELECT; explain results briefly." ), tools=run_sql,

Agent middleware wraps the turn; function middleware wraps each tool call

middleware=[SecurityAgentMiddleware(), LoggingFunctionMiddleware()], ) as agent, ):`

Enter fullscreen mode

Exit fullscreen mode

When to Use Agent Middleware

Agent middleware is the right choice for any concern that applies to the turn as a whole, rather than to a specific tool call or model request.

2.FunctionMiddleware- The ToolCall Layer

FunctionMiddleware fires inside the agent turn, but only when the LLM decides to invoke a tool. A single agent turn can trigger multiple tool calls, and FunctionMiddleware wraps each one independently. This makes it the right place for concerns that are specific to tool execution: timing, input validation, result transformation, and tool call auditing.

The FunctionInvocationContext Object

Each FunctionMiddleware component receives a FunctionInvocationContext, which is scoped to a single tool invocation:

When to Use FunctionMiddleware

Use it for concerns specific to tool execution the execution timing and performance monitoring, validating or sanitising tool arguments before they run, capping the number of times a tool may be called in one turn, transforming tool results before the LLM sees them, or auditing exactly which tools were called and with what arguments.

Terminating the Function-Calling Loop

Setting context.terminate = True inside FunctionMiddleware does something powerful: it stops the LLM’s function calling loop entirely. The LLM will not receive the tool result and will not make any further tool calls in this turn. This is useful for enforcing tool call budgets or stopping a loop that is going in an undesirable direction.

@function_middleware

async def budget_middleware(context, next):

if context.function.name == "run_sql":

Allow at most one SQL query per turn

call_count = context.metadata.get("sql_calls", 0)

if call_count >= 1:

context.result = "Query limit reached for this turn."

context.terminate = True # stop the LLM tool-calling loop

return

context.metadata["sql_calls"] = call_count + 1

await next(context)`

Enter fullscreen mode

Exit fullscreen mode

 Warning: Termination and Chat History

 Warning: Termination and Chat History

Terminating the function calling loop can leave the chat history in an inconsistent state a tool-call message with no corresponding tool result. This may cause errors if the same history is used in subsequent agent runs. Use termination carefully and consider clearing or repairing the history afterward.

3. ChatMiddleware —The LLM Call Layer

ChatMiddleware is the deepest layer. It wraps the actual inference call sent to the underlying language model the raw list of messages, the model options, and the response that comes back. This layer fires for every call to the LLM within a turn, which can be more than one if tools are used.

The ChatContext Object

Each ChatMiddleware component receives a ChatContext.

Function-Based Example

python

async def logging_chat_middleware(

context: ChatContext,

next: Callable[[ChatContext], Awaitable[None]],

) -> None:

print(f"[Chat] Sending {len(context.messages)} messages to model")

await next(context)

print("[Chat] Model response received")

 Because ChatMiddleware sees the exact message list going to the model, it can be used to inject system instructions, strip sensitive content, enforce token budgets, or even substitute a cached response all without the AgentMiddleware or FunctionMiddleware layers knowing anything changed.

 Because ChatMiddleware sees the exact message list going to the model, it can be used to inject system instructions, strip sensitive content, enforce token budgets, or even substitute a cached response all without the AgentMiddleware or FunctionMiddleware layers knowing anything changed.

When to Use ChatMiddleware

Use it when you need access to the raw LLM call: injecting or modifying system level instructions per call, redacting PII from messages before they leave your infrastructure, enforcing token count limits, caching repeated inference calls, or monitoring every model request for compliance purposes.

Registration: Agent Level vs. Run Level (run scope)

Microsoft Agent Framework supports two scopes for registering middleware. Understanding the difference is important for designing flexible agent systems.

Agent Level Middleware

Middleware passed in the middleware=[...] list when constructing the Agent applies to every single call to agent.run() for the lifetime of that agent. This is where you put policies that should always be enforced: security guards, mandatory audit logging, content filters.

Run Level Middleware

You can also pass middleware directly to a single agent.run() call. This middleware applies only to that one invocation and is discarded afterward. It is useful for per request customisation: adding a trace ID for a specific call, applying extra validation for a sensitive operation, or attaching a debug logger without affecting every other turn.

Choosing the Right Middleware Type

With three types available, the choice usually comes down to what you need to see and at what granularity.

Conclusion

Microsoft Agent Framework’s middleware brings the same pipeline contract you know from ASP.NET Core and Express ordered components, a context object, and a call_next() delegate into the world of AI agents. The structural difference is that an agent turn is not a single request/response cycle but a multi-step reasoning loop, and Agent Framework exposes three separate interception points within it.

AgentMiddleware is the right home for turn level concerns: security screening, content policy, and audit logging.

FunctionMiddleware is the right home for tool level concerns: execution timing, argument validation, and tool call budgets.

ChatMiddleware is the right home for model level concerns: raw message inspection, token enforcement, and caching.

Thanks Sreeni Ramadorai

Original source

DEV Community

https://dev.to/sreeni5018/agent-middleware-in-microsoft-agent-framework-10-2bm0

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelavailable

ModelsFresh

China’s DeepSeek taps Huawei chips for new AI model - malaysiasun.com

China’s DeepSeek taps Huawei chips for new AI model malaysiasun.com

GNews AI Huawei

1mabout 11 hours ago

ModelsLive

Scaffolding Enables Large Models To Achieve AGI - Let's Data Science

Scaffolding Enables Large Models To Achieve AGI Let's Data Science

GNews AI AGI

1mabout 2 hours ago

ReleasesLive

7 CVEs in 48 Hours: How PraisonAI Got Completely Owned — And What Every Agent Framework Should Learn

PraisonAI is a popular multi-agent Python framework supporting 100+ LLMs. On April 3, 2026, seven CVEs dropped simultaneously. Together they enable complete system compromise from zero authentication to arbitrary code execution. I spent the day analyzing each vulnerability. Here is what I found, why it matters, and the patterns every agent framework developer should audit for immediately. The Sandbox Bypass (CVE-2026-34938, CVSS 10.0) This is the most technically interesting attack I have seen this year. PraisonAI's execute_code() function runs a sandbox with three protection layers. The innermost wrapper, _safe_getattr , calls startswith() on incoming arguments to check for dangerous imports like os , subprocess , and sys . The attack: create a Python class that inherits from str and over

Dev.to AI

5mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 244 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

Agent Middleware in Microsoft Agent Framework 1.0

Not a New Idea

Middleware

Chaining

Execution Order

Why we need it

"Never run destructive SQL. Never send data to personal email."

Middleware fixes both

Result:

1. Agent Middleware-outermost layer

Implementation Styles & Chaining

Class-Based Implementation

Only the latest user utterance is checked (typical for a single-turn REPL).

Short-circuit: set the assistant reply here; do NOT call call_next() → no LLM, no tools.

Continue pipeline: model + optional run_sql; function middleware runs inside tool path.

here is the unsafe_input_reason function & For brevity, I’ve omitted the full code.”

Order matters: catch obvious SQL fragments before broader NL patterns.

Function-Based and Decorator-Based Styles

Function-based

Decorator-based (no type annotation required)

Registering Middleware

Agent middleware wraps the turn; function middleware wraps each tool call

When to Use Agent Middleware

2.FunctionMiddleware- The ToolCall Layer

The FunctionInvocationContext Object

When to Use FunctionMiddleware

Terminating the Function-Calling Loop

Allow at most one SQL query per turn

3. ChatMiddleware —The LLM Call Layer

The ChatContext Object

Function-Based Example

When to Use ChatMiddleware

Registration: Agent Level vs. Run Level (run scope)

Agent Level Middleware

Run Level Middleware

Choosing the Right Middleware Type

Conclusion

Daily AI Digest

More about

China’s DeepSeek taps Huawei chips for new AI model - malaysiasun.com

Scaffolding Enables Large Models To Achieve AGI - Let's Data Science

7 CVEs in 48 Hours: How PraisonAI Got Completely Owned — And What Every Agent Framework Should Learn

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Models

China’s DeepSeek taps Huawei chips for new AI model - malaysiasun.com

Exclusive | Pentagon Used Anthropic’s Claude in Maduro Venezuela Raid - WSJ

Exclusive | Pentagon Used Anthropic’s Claude in Maduro Venezuela Raid - WSJ

Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT - WSJ