A Self-Improving Architecture for Dynamic Safety in Large Language Models
arXiv:2511.07645v2 Announce Type: replace Abstract: Context: Large Language Models (LLMs) rely on static, pre-deployment safety mechanisms that cannot adapt to adversarial threats discovered after release. Objective: To design a software architecture enabling LLM-based systems to autonomously detect safety failures and synthesize defense policies at runtime, without retraining or manual intervention. Method: We propose the Self-Improving Safety Framework (SISF), grounded in the MAPE-K reference model. The framework couples a target LLM with a feedback loop: an Adjudicator detects breaches, a Policy Synthesis Module generates dual-mechanism defense policies (heuristic and semantic), and a Warden enforces them. We conducted seven experiments (10,061 evaluations) across four model families. R
View PDF HTML (experimental)
Abstract:Context: Large Language Models (LLMs) rely on static, pre-deployment safety mechanisms that cannot adapt to adversarial threats discovered after release. Objective: To design a software architecture enabling LLM-based systems to autonomously detect safety failures and synthesize defense policies at runtime, without retraining or manual intervention. Method: We propose the Self-Improving Safety Framework (SISF), grounded in the MAPE-K reference model. The framework couples a target LLM with a feedback loop: an Adjudicator detects breaches, a Policy Synthesis Module generates dual-mechanism defense policies (heuristic and semantic), and a Warden enforces them. We conducted seven experiments (10,061 evaluations) across four model families. Results: Across five reproducibility trials, SISF achieved a mean Attack Success Rate (ASR) of 0.27% (+/-0.15%), autonomously generating 240 policies per trial. Cross-model evaluation confirmed deployment portability. A held-out test showed a 68.5% proactive interception rate on unseen attacks. Stacked behind Llama Guard 4, the combined defense reduced residual ASR from 7.88% to 0.00%. Ablation confirmed both heuristic and semantic policy types are architecturally required. Conclusion: Self-adaptive architecture is a viable approach to LLM safety. SISF achieves sub-1% ASR through synchronous output monitoring, progressively shifting enforcement to fast, local Warden policies via the MAPE-K loop, offering a new pattern for building resilient AI systems.
Comments: Under review at the journal Information and Software Technology (Special Issue on Software Architecture for AI-Driven Systems)
Subjects:
Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
ACM classes: D.2.2; I.2.6; D.4.6
Cite as: arXiv:2511.07645 [cs.SE]
(or arXiv:2511.07645v2 [cs.SE] for this version)
https://doi.org/10.48550/arXiv.2511.07645
arXiv-issued DOI via DataCite
Submission history
From: Tyler Slater [view email] [v1] Mon, 10 Nov 2025 21:39:40 UTC (1,549 KB) [v2] Wed, 1 Apr 2026 17:52:48 UTC (7,778 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
llamamodellanguage model
Tutorial - How to Toggle On/OFf the Thinking Mode Directly in LM Studio for Any Thinking Model
LM Studio is an exceptional tool for running local LLMs, but it has a specific quirk: the "Thinking" (reasoning) toggle often only appears for models downloaded directly through the LM Studio interface. If you use external GGUFs from providers like Unsloth or Bartowski, this capability is frequently hidden. Here is how to manually activate the Thinking switch for any reasoning model. ### Method 1: The Native Way (Easiest) The simplest way to ensure the toggle appears is to download models directly within LM Studio. Before downloading, verify that the **Thinking Icon** (the green brain symbol) is present next to the model's name. If this icon is visible, the toggle will work automatically in your chat window. ### Method 2: The Manual Workaround (For External Models) If you prefer to manage

Building the Memory Layer for a Voice AI Agent
Photo by Enchanted Tools on Unsplash Voice AI raises the bar for responsiveness completely. In a chatbot, a two or three second delay feels acceptable. In voice, that same delay feels strange. People start wondering if the app heard them, whether the microphone failed, or if they should repeat themselves. Voice is much less forgiving. That was the main thing I kept running into while experimenting with a voice journal app: a voice-first app powered by Sarvam AI for speech to text and text to speech conversion and Redis Agent Memory Server for memory. It’s a pretty straight forward app. A user speaks, the app transcribes the audio, decides whether the user wants to save something or ask something, fetches the right context, and then responds back in voice. What makes it interesting is build
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Tutorial - How to Toggle On/OFf the Thinking Mode Directly in LM Studio for Any Thinking Model
LM Studio is an exceptional tool for running local LLMs, but it has a specific quirk: the "Thinking" (reasoning) toggle often only appears for models downloaded directly through the LM Studio interface. If you use external GGUFs from providers like Unsloth or Bartowski, this capability is frequently hidden. Here is how to manually activate the Thinking switch for any reasoning model. ### Method 1: The Native Way (Easiest) The simplest way to ensure the toggle appears is to download models directly within LM Studio. Before downloading, verify that the **Thinking Icon** (the green brain symbol) is present next to the model's name. If this icon is visible, the toggle will work automatically in your chat window. ### Method 2: The Manual Workaround (For External Models) If you prefer to manage





Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!