Models llama model language model training release announce

A Self-Improving Architecture for Dynamic Safety in Large Language Models

arXiv cs.SEby [Submitted on 10 Nov 2025 (v1), last revised 1 Apr 2026 (this version, v2)]April 3, 20262 min read1 views

arXiv:2511.07645v2 Announce Type: replace Abstract: Context: Large Language Models (LLMs) rely on static, pre-deployment safety mechanisms that cannot adapt to adversarial threats discovered after release. Objective: To design a software architecture enabling LLM-based systems to autonomously detect safety failures and synthesize defense policies at runtime, without retraining or manual intervention. Method: We propose the Self-Improving Safety Framework (SISF), grounded in the MAPE-K reference model. The framework couples a target LLM with a feedback loop: an Adjudicator detects breaches, a Policy Synthesis Module generates dual-mechanism defense policies (heuristic and semantic), and a Warden enforces them. We conducted seven experiments (10,061 evaluations) across four model families. R

View PDF HTML (experimental)

Abstract:Context: Large Language Models (LLMs) rely on static, pre-deployment safety mechanisms that cannot adapt to adversarial threats discovered after release. Objective: To design a software architecture enabling LLM-based systems to autonomously detect safety failures and synthesize defense policies at runtime, without retraining or manual intervention. Method: We propose the Self-Improving Safety Framework (SISF), grounded in the MAPE-K reference model. The framework couples a target LLM with a feedback loop: an Adjudicator detects breaches, a Policy Synthesis Module generates dual-mechanism defense policies (heuristic and semantic), and a Warden enforces them. We conducted seven experiments (10,061 evaluations) across four model families. Results: Across five reproducibility trials, SISF achieved a mean Attack Success Rate (ASR) of 0.27% (+/-0.15%), autonomously generating 240 policies per trial. Cross-model evaluation confirmed deployment portability. A held-out test showed a 68.5% proactive interception rate on unseen attacks. Stacked behind Llama Guard 4, the combined defense reduced residual ASR from 7.88% to 0.00%. Ablation confirmed both heuristic and semantic policy types are architecturally required. Conclusion: Self-adaptive architecture is a viable approach to LLM safety. SISF achieves sub-1% ASR through synchronous output monitoring, progressively shifting enforcement to fast, local Warden policies via the MAPE-K loop, offering a new pattern for building resilient AI systems.

Comments: Under review at the journal Information and Software Technology (Special Issue on Software Architecture for AI-Driven Systems)

Subjects:

Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

ACM classes: D.2.2; I.2.6; D.4.6

Cite as: arXiv:2511.07645 [cs.SE]

(or arXiv:2511.07645v2 [cs.SE] for this version)

https://doi.org/10.48550/arXiv.2511.07645

arXiv-issued DOI via DataCite

Submission history

From: Tyler Slater [view email] [v1] Mon, 10 Nov 2025 21:39:40 UTC (1,549 KB) [v2] Wed, 1 Apr 2026 17:52:48 UTC (7,778 KB)

Original source

arXiv cs.SE

https://arxiv.org/abs/2511.07645

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamamodellanguage model

ModelsLive

Tutorial - How to Toggle On/OFf the Thinking Mode Directly in LM Studio for Any Thinking Model

LM Studio is an exceptional tool for running local LLMs, but it has a specific quirk: the "Thinking" (reasoning) toggle often only appears for models downloaded directly through the LM Studio interface. If you use external GGUFs from providers like Unsloth or Bartowski, this capability is frequently hidden. Here is how to manually activate the Thinking switch for any reasoning model. ### Method 1: The Native Way (Easiest) The simplest way to ensure the toggle appears is to download models directly within LM Studio. Before downloading, verify that the **Thinking Icon** (the green brain symbol) is present next to the model's name. If this icon is visible, the toggle will work automatically in your chat window. ### Method 2: The Manual Workaround (For External Models) If you prefer to manage

Reddit r/LocalLLaMA

5m39 minutes ago

Models

Enterprise AI Needs More Than Models, It Needs Trust - Oracle Blogs

Enterprise AI Needs More Than Models, It Needs Trust Oracle Blogs

Google News - Scale AI data

1m11 days ago

ProductsLive

Building the Memory Layer for a Voice AI Agent

Photo by Enchanted Tools on Unsplash Voice AI raises the bar for responsiveness completely. In a chatbot, a two or three second delay feels acceptable. In voice, that same delay feels strange. People start wondering if the app heard them, whether the microphone failed, or if they should repeat themselves. Voice is much less forgiving. That was the main thing I kept running into while experimenting with a voice journal app: a voice-first app powered by Sarvam AI for speech to text and text to speech conversion and Redis Agent Memory Server for memory. It’s a pretty straight forward app. A user speaks, the app transcribes the audio, decides whether the user wants to save something or ask something, fetches the right context, and then responds back in voice. What makes it interesting is build

Towards AI

8mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 163 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

A Self-Improving Architecture for Dynamic Safety in Large Language Models

Submission history

Daily AI Digest

More about

Tutorial - How to Toggle On/OFf the Thinking Mode Directly in LM Studio for Any Thinking Model

Enterprise AI Needs More Than Models, It Needs Trust - Oracle Blogs

Building the Memory Layer for a Voice AI Agent

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Models

High-Precision OCR for Medical Device Labeling with RF-DETR and Gemini 2.5 Flash

Tutorial - How to Toggle On/OFf the Thinking Mode Directly in LM Studio for Any Thinking Model

Enterprise AI Needs More Than Models, It Needs Trust - Oracle Blogs

Anthropic Races to Contain Leak of Code Behind Claude AI Agent - WSJ