Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessPower Pages Authentication Methods: The Complete Guide (2026)DEV CommunityClaude Code Unpacked: what the visual guide reveals about the architectureDEV CommunityExolane Review: What It Gets Right on Custody, Funding Caps, and RiskDEV CommunityGitHub Agentic Workflows: AI Agents Are Coming for Your Repository Maintenance Tasks (And That's a Good Thing)DEV CommunityAlibaba Launches XuanTie C950 CPU for Agentic AIEE TimesThe Illusion of Data Custody in Legal AI — and the Architecture I Built to Replace ItDEV CommunityTurboQuant, KIVI, and the Real Cost of Long-Context KV CacheDEV CommunityWhy ChatGPT Cites Your Competitors (Not You)DEV CommunityIntroducing Anti-Moral RealismLessWrong AIFrom idea to live web app in minutes with Spektrum. An AI-powered web app builder for MVPs, rapid prototyping, and full-stack JavaScript apps. Skip setup, generate real products, and deploy instantly without infrastructure headaches. 🔥DEV CommunityAnthropic Just Proved That Codebase Governance Is Now the #1 Priority for Every Engineering OrgDEV CommunityThe history of Apple in photos, from the early Steve Jobs era to the iPhone launch to its 50-year markBusiness InsiderBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessPower Pages Authentication Methods: The Complete Guide (2026)DEV CommunityClaude Code Unpacked: what the visual guide reveals about the architectureDEV CommunityExolane Review: What It Gets Right on Custody, Funding Caps, and RiskDEV CommunityGitHub Agentic Workflows: AI Agents Are Coming for Your Repository Maintenance Tasks (And That's a Good Thing)DEV CommunityAlibaba Launches XuanTie C950 CPU for Agentic AIEE TimesThe Illusion of Data Custody in Legal AI — and the Architecture I Built to Replace ItDEV CommunityTurboQuant, KIVI, and the Real Cost of Long-Context KV CacheDEV CommunityWhy ChatGPT Cites Your Competitors (Not You)DEV CommunityIntroducing Anti-Moral RealismLessWrong AIFrom idea to live web app in minutes with Spektrum. An AI-powered web app builder for MVPs, rapid prototyping, and full-stack JavaScript apps. Skip setup, generate real products, and deploy instantly without infrastructure headaches. 🔥DEV CommunityAnthropic Just Proved That Codebase Governance Is Now the #1 Priority for Every Engineering OrgDEV CommunityThe history of Apple in photos, from the early Steve Jobs era to the iPhone launch to its 50-year markBusiness Insider

A Universal Vibe? Finding and Controlling Language-Agnostic Informal Register with SAEs

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2603.26236v1 Announce Type: new Abstract: While multilingual language models successfully transfer factual and syntactic knowledge across languages, it remains unclear whether they process culture-specific pragmatic registers, such as slang, as isolated language-specific memorizations or as unified, abstract concepts. We study this by probing the internal representations of Gemma-2-9B-IT using Sparse Autoencoders (SAEs) across three typologically diverse source languages: English, Hebrew, and Russian. To definitively isolate pragmatic register processing from trivial lexical sensitivity, — Uri Z. Kialy, Avi Shtarkberg, Ayal Klein

View PDF HTML (experimental)

Abstract:While multilingual language models successfully transfer factual and syntactic knowledge across languages, it remains unclear whether they process culture-specific pragmatic registers, such as slang, as isolated language-specific memorizations or as unified, abstract concepts. We study this by probing the internal representations of Gemma-2-9B-IT using Sparse Autoencoders (SAEs) across three typologically diverse source languages: English, Hebrew, and Russian. To definitively isolate pragmatic register processing from trivial lexical sensitivity, we introduce a novel dataset in which every target term is polysemous, appearing in both literal and informal contexts. We find that while much of the informal-register signal is distributed across language-specific features, a small but highly robust cross-linguistic core consistently emerges. This shared core forms a geometrically coherent ``informal register subspace'' that sharpens in the model's deeper layers. Crucially, these shared representations are not merely correlational: activation steering with these features causally shifts output formality across all source languages and transfers zero-shot to six unseen languages spanning diverse language families and scripts. Together, these results provide the first mechanistic evidence that multilingual LLMs internalize informal register not just as surface-level heuristics, but as a portable, language-agnostic pragmatic abstraction.

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2603.26236 [cs.CL]

(or arXiv:2603.26236v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.26236

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Uri Zion Kialy [view email] [v1] Fri, 27 Mar 2026 09:58:31 UTC (3,482 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
A Universal…researchpaperarxivnlplanguage-mo…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 155 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers