Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessHigh-Precision OCR for Medical Device Labeling with RF-DETR and Gemini 2.5 FlashRoboflow BlogNvidia’s AI Powerhouse Rally Ignites Fresh Wall Street Hype - TipRanksGNews AI NVIDIAOpenAI Called The One Person AI Startup And Three Founders Proved It - ForbesGoogle News: OpenAIAnthropic Just Leaked Its Own AI Secrets. Here’s What It Means for You.Towards AITutorial - How to Toggle On/OFf the Thinking Mode Directly in LM Studio for Any Thinking ModelReddit r/LocalLLaMAThe Real Reason OpenAI Shut Sora Down Is a Warning to Every AI Startup - FuturismGoogle News: OpenAIDeep Machine Learning - Artificial Neural Network - - TradingViewGoogle News: Machine LearningChinese firms market Iran war intelligence ‘exposing’ U.S. forces - The Washington PostGNews AI military[P] Implemented ACT-R cognitive decay and hyperdimensional computing for AI agent memory (open source)Reddit r/MachineLearningtrunk/8c8414e5c03f21b5405acc2fd9115f4448dcd08a: revert https://github.com/pytorch/pytorch/pull/172340 (#179151)PyTorch ReleasesWhite Lake group to host April 14 program on how artificial intelligence works - Shoreline Media GroupGoogle News: AINvidia’s $2 billion Marvell bet is not an investment. It is a toll booth.The Next Web NeuralBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessHigh-Precision OCR for Medical Device Labeling with RF-DETR and Gemini 2.5 FlashRoboflow BlogNvidia’s AI Powerhouse Rally Ignites Fresh Wall Street Hype - TipRanksGNews AI NVIDIAOpenAI Called The One Person AI Startup And Three Founders Proved It - ForbesGoogle News: OpenAIAnthropic Just Leaked Its Own AI Secrets. Here’s What It Means for You.Towards AITutorial - How to Toggle On/OFf the Thinking Mode Directly in LM Studio for Any Thinking ModelReddit r/LocalLLaMAThe Real Reason OpenAI Shut Sora Down Is a Warning to Every AI Startup - FuturismGoogle News: OpenAIDeep Machine Learning - Artificial Neural Network - - TradingViewGoogle News: Machine LearningChinese firms market Iran war intelligence ‘exposing’ U.S. forces - The Washington PostGNews AI military[P] Implemented ACT-R cognitive decay and hyperdimensional computing for AI agent memory (open source)Reddit r/MachineLearningtrunk/8c8414e5c03f21b5405acc2fd9115f4448dcd08a: revert https://github.com/pytorch/pytorch/pull/172340 (#179151)PyTorch ReleasesWhite Lake group to host April 14 program on how artificial intelligence works - Shoreline Media GroupGoogle News: AINvidia’s $2 billion Marvell bet is not an investment. It is a toll booth.The Next Web Neural
AI NEWS HUBbyEIGENVECTOREigenvector

A Self-Improving Architecture for Dynamic Safety in Large Language Models

arXiv cs.SEby [Submitted on 10 Nov 2025 (v1), last revised 1 Apr 2026 (this version, v2)]April 3, 20262 min read1 views
Source Quiz

arXiv:2511.07645v2 Announce Type: replace Abstract: Context: Large Language Models (LLMs) rely on static, pre-deployment safety mechanisms that cannot adapt to adversarial threats discovered after release. Objective: To design a software architecture enabling LLM-based systems to autonomously detect safety failures and synthesize defense policies at runtime, without retraining or manual intervention. Method: We propose the Self-Improving Safety Framework (SISF), grounded in the MAPE-K reference model. The framework couples a target LLM with a feedback loop: an Adjudicator detects breaches, a Policy Synthesis Module generates dual-mechanism defense policies (heuristic and semantic), and a Warden enforces them. We conducted seven experiments (10,061 evaluations) across four model families. R

View PDF HTML (experimental)

Abstract:Context: Large Language Models (LLMs) rely on static, pre-deployment safety mechanisms that cannot adapt to adversarial threats discovered after release. Objective: To design a software architecture enabling LLM-based systems to autonomously detect safety failures and synthesize defense policies at runtime, without retraining or manual intervention. Method: We propose the Self-Improving Safety Framework (SISF), grounded in the MAPE-K reference model. The framework couples a target LLM with a feedback loop: an Adjudicator detects breaches, a Policy Synthesis Module generates dual-mechanism defense policies (heuristic and semantic), and a Warden enforces them. We conducted seven experiments (10,061 evaluations) across four model families. Results: Across five reproducibility trials, SISF achieved a mean Attack Success Rate (ASR) of 0.27% (+/-0.15%), autonomously generating 240 policies per trial. Cross-model evaluation confirmed deployment portability. A held-out test showed a 68.5% proactive interception rate on unseen attacks. Stacked behind Llama Guard 4, the combined defense reduced residual ASR from 7.88% to 0.00%. Ablation confirmed both heuristic and semantic policy types are architecturally required. Conclusion: Self-adaptive architecture is a viable approach to LLM safety. SISF achieves sub-1% ASR through synchronous output monitoring, progressively shifting enforcement to fast, local Warden policies via the MAPE-K loop, offering a new pattern for building resilient AI systems.

Comments: Under review at the journal Information and Software Technology (Special Issue on Software Architecture for AI-Driven Systems)

Subjects:

Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

ACM classes: D.2.2; I.2.6; D.4.6

Cite as: arXiv:2511.07645 [cs.SE]

(or arXiv:2511.07645v2 [cs.SE] for this version)

https://doi.org/10.48550/arXiv.2511.07645

arXiv-issued DOI via DataCite

Submission history

From: Tyler Slater [view email] [v1] Mon, 10 Nov 2025 21:39:40 UTC (1,549 KB) [v2] Wed, 1 Apr 2026 17:52:48 UTC (7,778 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamamodellanguage model

Knowledge Map

Knowledge Map
TopicsEntitiesSource
A Self-Impr…llamamodellanguage mo…trainingreleaseannouncearXiv cs.SE

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 163 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models