Research Papers research paper arxiv nlp language-models

Switch Attention: Towards Dynamic and Fine-grained Hybrid Transformers

arXivMarch 30, 202610 min read0 views

arXiv:2603.26380v1 Announce Type: new Abstract: The attention mechanism has been the core component in modern transformer architectures. However, the computation of standard full attention scales quadratically with the sequence length, serving as a major bottleneck in long-context language modeling. Sliding window attention restricts the context length for better efficiency at the cost of narrower receptive fields. While existing efforts attempt to take the benefits from both sides by building hybrid models, they often resort to static, heuristically designed alternating patterns that limit ef — Yusheng Zhao, Hourun Li, Bohan Wu, Jingyang Yuan, Meng Zhang, Yichun Yin, Lifeng Shang, Ming Zhang

View PDF HTML (experimental)

Abstract:The attention mechanism has been the core component in modern transformer architectures. However, the computation of standard full attention scales quadratically with the sequence length, serving as a major bottleneck in long-context language modeling. Sliding window attention restricts the context length for better efficiency at the cost of narrower receptive fields. While existing efforts attempt to take the benefits from both sides by building hybrid models, they often resort to static, heuristically designed alternating patterns that limit efficient allocation of computation in various scenarios. In this paper, we propose Switch Attention (SwiAttn), a novel hybrid transformer that enables dynamic and fine-grained routing between full attention and sliding window attention. For each token at each transformer layer, SwiAttn dynamically routes the computation to either a full-attention branch for global information aggregation or a sliding-window branch for efficient local pattern matching. An adaptive regularization objective is designed to encourage the model towards efficiency. Moreover, we adopt continual pretraining to optimize the model, transferring the full attention architecture to the hybrid one. Extensive experiments are conducted on twenty-three benchmark datasets across both regular (4K) and long (32K) context lengths, demonstrating the effectiveness of the proposed method.

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2603.26380 [cs.CL]

(or arXiv:2603.26380v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.26380

arXiv-issued DOI via DataCite

Submission history

From: Yusheng Zhao [view email] [v1] Fri, 27 Mar 2026 13:04:29 UTC (1,150 KB)

Original source

arXiv

https://arxiv.org/abs/2603.26380

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Products

Exclusive: Longtime Google DeepMind researcher David Silver leaves to found his own AI startup - fortune.com

<a href="https://news.google.com/rss/articles/CBMiwgFBVV95cUxNb3Z5ZnVqZDd2NzFYNG1CTmJnc2V2RlZpa01yQ2Rld29IeUV2d3BBZUJqMFBpdWxEY05SQ24wX25uS1hEcmpMUjFsUTU5YjhuYjFCRmJPeTJzM3JtMTRoR0hlaGI3dWt1b1B3b05COXloOC1IRU1Wc0hwY3hTVXA4OEgzajdZNXREUTBrWXdQUm9fUG1WMUpaZTI1azNpN1pPa2dfeVRncmNRRjNEajktN3JVcVZNdkUzS3BjYUMzUDVuZw?oc=5" target="_blank">Exclusive: Longtime Google DeepMind researcher David Silver leaves to found his own AI startup</a> fortune.com

Google News: DeepMind

1m2 months ago

Research PapersLive

Research roundup: 7 cool science stories we almost missed

Ars Technica

1mabout 2 hours ago

ReleasesLive

Roguelike Devlog: Redesigning a Game UI With an AI 2D Game Maker

Sector Scavengers is a spacefaring extraction roguelike where each run feeds a larger civilization-building meta game. This week was all about solving a UI problem that kept getting worse the longer I ignored it: one hub trying to do too much. What I learned quickly is that running both game modes through a single central hub was making both of them worse. Here is how I used Makko to work through it. <h2> When One Screen Tries to Do Everything </h2> My meta progression systems — crew advancement, station building, hardware research, void powers, and card unlocks — were all living in the same HUD as the controls for individual Expedition runs. On paper it sounded efficient. In practice it created a serious information architecture problem. The deeper I got into it, t

DEV Community

7m41 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 168 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

Research roundup: 7 cool science stories we almost missed

Ars Technica

1mabout 2 hours ago

Research PapersFresh

AI maps science papers to predict research trends two to three years ahead - Tech Xplore

<a href="https://news.google.com/rss/articles/CBMie0FVX3lxTE5aTkZYTWdaRDZwTXNRMldpMG1WZ1YzWDZTOHN5M183Z3A1ZTFYbnhEWTdPRmpvZnZFU0xodlRsNWxFaGxTcEpwalhJNmJpQWE5VjhaRS1tOXJIeTc5Z0JNblJ3dFd4WjRYZGJOX0NrWGt6ZmZJVTBpRm5wWQ?oc=5" target="_blank">AI maps science papers to predict research trends two to three years ahead</a> Tech Xplore

Google News: Machine Learning

1mabout 2 hours ago

Research PapersFresh

AI inspires new research topics in materials science - Nanowerk

<a href="https://news.google.com/rss/articles/CBMiZ0FVX3lxTFBPWlJSM2ExeVQ3LVppTm45NHpEMW9YVkxscThCNDd2OVB0c3J1ZmVCbWNSZWZ0TjZwSzlOdEFXN2UtRk5LU1hxdXd4ZklldGxoM0FZSnhCd19PWkNHQ1ZRVDNwSHNUSk0?oc=5" target="_blank">AI inspires new research topics in materials science</a> Nanowerk

GNews AI materials

1mabout 10 hours ago

Research PapersFresh

AI maps science papers to predict research trends two to three years ahead

The number of scientific papers is growing so rapidly that scientists are no longer able to keep track of all of them, even in their own research area. Researchers from the Karlsruhe Institute of Technology (KIT), in collaboration with scientific partners, have shown how new research ideas can still be obtained from this wealth of information. Using artificial intelligence (AI), they systematically analyzed materials science publications to identify potential new avenues of research. Their results have been published in Nature Machine Intelligence.

Phys.org AI

1mabout 2 hours ago