Research Papers research paper arxiv machine-learning deep-learning

Do All Vision Transformers Need Registers? A Cross-Architectural Reassessment

arXivMarch 30, 202610 min read0 views

arXiv:2603.25803v1 Announce Type: cross Abstract: Training Vision Transformers (ViTs) presents significant challenges, one of which is the emergence of artifacts in attention maps, hindering their interpretability. Darcet et al. (2024) investigated this phenomenon and attributed it to the need of ViTs to store global information beyond the [CLS] token. They proposed a novel solution involving the addition of empty input tokens, named registers, which successfully eliminate artifacts and improve the clarity of attention maps. In this work, we reproduce the findings of Darcet et al. (2024) and e — Spiros Baxevanakis, Platon Karageorgis, Ioannis Dravilas, Konrad Szewczyk

View PDF HTML (experimental)

Abstract:Training Vision Transformers (ViTs) presents significant challenges, one of which is the emergence of artifacts in attention maps, hindering their interpretability. Darcet et al. (2024) investigated this phenomenon and attributed it to the need of ViTs to store global information beyond the [CLS] token. They proposed a novel solution involving the addition of empty input tokens, named registers, which successfully eliminate artifacts and improve the clarity of attention maps. In this work, we reproduce the findings of Darcet et al. (2024) and evaluate the generalizability of their claims across multiple models, including DINO, DINOv2, OpenCLIP, and DeiT3. While we confirm the validity of several of their key claims, our results reveal that some claims do not extend universally to other models. Additionally, we explore the impact of model size, extending their findings to smaller models. Finally, we untie terminology inconsistencies found in the original paper and explain their impact when generalizing to a wider range of models.

Comments: Preprint. Submitted to Transactions on Machine Learning Research (TMLR). 26 pages, 17 figures

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

ACM classes: I.2.10; I.4.8; I.5.4

Cite as: arXiv:2603.25803 [cs.CV]

(or arXiv:2603.25803v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.25803

arXiv-issued DOI via DataCite

Submission history

From: Spiros Baxevanakis [view email] [v1] Thu, 26 Mar 2026 18:09:12 UTC (27,638 KB)

Original source

arXiv

https://arxiv.org/abs/2603.25803

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ModelsFresh

Is Turboquant really a game changer?

I am currently utilizing qwen3.5 and Gemma 4 model. Realized Gemma 4 requires 2x ram for same context length. As far as I understand, what turbo quant gives is quantizing kv cache into about 4 bit and minimize the loses But Q8 still not lose the context that much so isn't kv cache ram for qwen 3.5 q8 and Gemma 4 truboquant is the same? Is turboquant also applicable in qwen's cache architecture? because as far as I know they didn't tested it in qwen3.5 style kv cache in their paper. Just curious, I started to learn local LLM recently submitted by /u/Interesting-Print366 [link] [comments]

Reddit r/LocalLLaMA

1mabout 2 hours ago

ModelsFresh

Found how to toggle reasoning mode for Gemma in LM-Studio!

I’ve figured out how to trigger the reasoning process by adding "/think" to the system prompt. Heads up: the thought tags have an unusual pipe ( | ) placement, which is why many LLM fail to parse the reasoning section correctly. So Start String is : " thought" And End String is " " Here is the Jinja template: https://pastebin.com/MGmD8UiC Tested and working with the 26B and 31B versions. submitted by /u/Adventurous-Paper566 [link] [comments]

Reddit r/LocalLLaMA

1mabout 4 hours ago

ReleasesLive

AI companions can comfort lonely users but may deepen distress over time

AI companions are always available, never judge, never tire and never demand anything in return. If someone is struggling with loneliness, this frictionlessness can seem profoundly appealing. However, new research shows that in the long term, seeking emotional support from an AI companion can pull users away from important human relationships.

TechXplore AI

1mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 186 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

New Rowhammer attack can grant kernel-level control on Nvidia workstation GPUs

A study from researchers at UNC Chapel Hill and Georgia Tech shows that GDDR6-based Rowhammer attacks can grant kernel-level access to Linux systems equipped with GPUs based on Nvidia's Ampere and Ada Lovelace architectures. The vulnerability appears significantly more severe than what was outlined in a paper last year. Read Entire Article

TechSpot

1mabout 1 hour ago

Research PapersFresh

[D] ICML Reviewer Acknowledgement

Hi, I'm a little confused about ICML discussion period Does the period for reviewer acknowledging responses have already ended? One of the four reviewers did not present any answer to a paper of mine. Do you know if the reviewer can still change their score before April 7th? There is a reviewer comment that I will answer on Monday. Will the reviewer be able to update the score after seeing my answer? Thanks! submitted by /u/Massive_Horror9038 [link] [comments]

Reddit r/MachineLearning

1mabout 4 hours ago

Research PapersFresh

Considerations for growing the pie

Recently some friends and I were comparing growing the pie interventions to an increasing our friends' share of the pie intervention, and at first we mostly missed some general considerations against the latter type. 1. Decision-theoretic considerations The world is full of people with different values working towards their own ends; each of them can choose to use their resources to increase the total size of the pie or to increase their share of the pie. All of them would significantly prefer a world in which resources were used to increase the size of the pie, and this leads to a number [of] compelling justifications for each individual to cooperate. . . . by increasing the size of the pie we create a world which is better for people on average, and from behind the veil of ignorance we s

LessWrong AI

5mabout 2 hours ago

Research PapersFresh

The Paper That Broke Deep Learning Open: A Brutal, Illustrated Walkthrough of “Attention Is All You…

Why one 2017 research paper wiped RNNs off the map — and rewired the entire trajectory of machine intelligence. Continue reading on CodeX »

Medium AI

1mabout 3 hours ago