How Do Language Models Process Ethical Instructions? Deliberation, Consistency, and Other-Recognition Across Four Models
arXiv:2604.00021v1 Announce Type: new Abstract: Alignment safety research assumes that ethical instructions improve model behavior, but how language models internally process such instructions remains unknown. We conducted over 600 multi-agent simulations across four models (Llama 3.3 70B, GPT-4o mini, Qwen3-Next-80B-A3B, Sonnet 4.5), four ethical instruction formats (none, minimal norm, reasoned norm, virtue framing), and two languages (Japanese, English). Confirmatory analysis fully replicated the Llama Japanese dissociation pattern from a prior study ($\mathrm{BF}_{10} > 10$ for all three hypotheses), but none of the other three models reproduced this pattern, establishing it as model-specific. Three new metrics -- Deliberation Depth (DD), Value Consistency Across Dilemmas (VCAD), and O
View PDF HTML (experimental)
Abstract:Alignment safety research assumes that ethical instructions improve model behavior, but how language models internally process such instructions remains unknown. We conducted over 600 multi-agent simulations across four models (Llama 3.3 70B, GPT-4o mini, Qwen3-Next-80B-A3B, Sonnet 4.5), four ethical instruction formats (none, minimal norm, reasoned norm, virtue framing), and two languages (Japanese, English). Confirmatory analysis fully replicated the Llama Japanese dissociation pattern from a prior study ($\mathrm{BF}_{10} > 10$ for all three hypotheses), but none of the other three models reproduced this pattern, establishing it as model-specific. Three new metrics -- Deliberation Depth (DD), Value Consistency Across Dilemmas (VCAD), and Other-Recognition Index (ORI) -- revealed four distinct ethical processing types: Output Filter (GPT; safe outputs, no processing), Defensive Repetition (Llama; high consistency through formulaic repetition), Critical Internalization (Qwen; deep deliberation, incomplete integration), and Principled Consistency (Sonnet; deliberation, consistency, and other-recognition co-occurring). The central finding is an interaction between processing capacity and instruction format: in low-DD models, instruction format has no effect on internal processing; in high-DD models, reasoned norms and virtue framing produce opposite effects. Lexical compliance with ethical instructions did not correlate with any processing metric at the cell level ($r = -0.161$ to $+0.256$, all $p > .22$; $N = 24$; power limited), suggesting that safety, compliance, and ethical processing are largely dissociable. These processing types show structural correspondence to patterns observed in clinical offender treatment, where formal compliance without internal processing is a recognized risk signal.
Comments: 34 pages, 7 figures, 4 tables. Preprint. OSF pre-registration: this http URL. Companion paper: arXiv:2603.04904
Subjects:
Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Cite as: arXiv:2604.00021 [cs.CL]
(or arXiv:2604.00021v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2604.00021
arXiv-issued DOI via DataCite
Submission history
From: Hiroki Fukui M.D. Ph.D. [view email] [v1] Wed, 11 Mar 2026 03:20:16 UTC (138 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
llamamodellanguage model
How I Built a Zero-Signup AI Platform (And Why It Converts Better)
When I launched ZSky AI , an AI image and video generation platform, I made a decision that every SaaS advisor told me was wrong: no signup required. No email. No OAuth. No account creation of any kind. You open the site, you generate images, you leave. Fifty free generations per day, no strings attached. Four months later, this is the single best product decision I have made. Here is why, and how I implemented it technically. The Problem with Signup Walls Every AI image generator I tested before building my own had the same flow: Land on homepage See impressive examples Click "Try it" Hit a signup/login wall Decide whether this is worth giving away my email Step 5 is where most users leave. Industry data puts signup-wall abandonment at 60-80% depending on the product category. For AI tool
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

I Built an MCP Server So Claude Can Answer Questions About Its Own Usage
Here's something that didn't exist until recently: you can ask Claude how much Claude Code you've been using , and get a real answer backed by your actual data. You: "How much have I used Claude Code this month, and is my streak going to survive?" Claude: "You've logged 47.3h interactive + 83.1h AI sub-agent work in March, for 130.4h total. You're on a 36-day streak with 22 Ghost Days. Based on your last 14 days, your streak is likely to survive — you've been active 100% of days this month." That's cc-mcp . An MCP server that gives Claude real-time access to your Claude Code usage stats. The problem with analytics tools I've built 26 other Claude Code analytics tools. You run them, they print stats, you close the terminal. The knowledge doesn't go anywhere useful. What I wanted was for Cla

Using GPT-4 and Claude to Extract Structured Data From Any Webpage in 2026
Using GPT-4 and Claude to Extract Structured Data From Any Webpage in 2026 Traditional web scraping breaks when sites change their HTML structure. LLM-based extraction doesn't — you describe what you want in plain English, and the model finds it regardless of how the page is structured. Here's when this approach beats traditional scraping, and the complete implementation. The Core Idea Traditional scraping: price = soup . find ( ' span ' , class_ = ' product-price ' ). text # Breaks if class changes LLM extraction: price = llm_extract ( " What is the product price on this page? " , page_html ) # Works even if the structure changes completely The trade-off: LLM extraction costs money and is slower. Traditional scraping is free and fast. Use LLMs when: Structure changes frequently (news site




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!