Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessCommunity Without Tokens: What AI Dev Tools Can Learn from Crypto's Community PlaybookDev.to AIGarry Tan's gstack: Install This 56k-Star 'Virtual Team' for Claude CodeDev.to AIA Step-by-Step Guide to K-Nearest Neighbors (KNN) in Machine LearningDev.to AIOil prices extend gains after record monthly rally as Iran war fuels supply worriesCNBC TechnologyWhy Your "AI Assistant" is Obsolete: Welcoming the Era of Agentic Workflows & MCPDev.to AIBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AIHow to Create Viral Videos with AI in 2026Dev.to AIEmbers of Autoregression: Understanding Large Language Models Through theProblem They are Trained to SolveDev.to AIBuilding the Payment Gateway for AI Agents: A Technical Deep DiveDev.to AIOpenClaw is incredible until you deploy it wrongDev.to AIWhy Most Frontend Apps Are Smarter Than Their Engineers RealizeDev.to AIThis Isn’t Another ‘AI Productivity Hack’ ArticleMedium AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessCommunity Without Tokens: What AI Dev Tools Can Learn from Crypto's Community PlaybookDev.to AIGarry Tan's gstack: Install This 56k-Star 'Virtual Team' for Claude CodeDev.to AIA Step-by-Step Guide to K-Nearest Neighbors (KNN) in Machine LearningDev.to AIOil prices extend gains after record monthly rally as Iran war fuels supply worriesCNBC TechnologyWhy Your "AI Assistant" is Obsolete: Welcoming the Era of Agentic Workflows & MCPDev.to AIBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AIHow to Create Viral Videos with AI in 2026Dev.to AIEmbers of Autoregression: Understanding Large Language Models Through theProblem They are Trained to SolveDev.to AIBuilding the Payment Gateway for AI Agents: A Technical Deep DiveDev.to AIOpenClaw is incredible until you deploy it wrongDev.to AIWhy Most Frontend Apps Are Smarter Than Their Engineers RealizeDev.to AIThis Isn’t Another ‘AI Productivity Hack’ ArticleMedium AI

vGamba: Attentive State Space Bottleneck for efficient Long-range Dependencies in Visual Recognition

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2503.21262v3 Announce Type: replace Abstract: Capturing long-range dependencies (LRD) efficiently is a core challenge in visual recognition, and state-space models (SSMs) have recently emerged as a promising alternative to self-attention for addressing it. However, adapting SSMs into CNN-based bottlenecks remains challenging, as existing approaches require complex pre-processing and multiple SSM replicas per block, limiting their practicality. We propose vGamba, a hybrid vision backbone that replaces the standard bottleneck convolution with a single lightweight SSM block, the Gamba cell, — Yunusa Haruna, Adamu Lawan, Shamsuddeen Hassan Muhammad, Jiaquan Zhang, Chaoning Zhang

View PDF HTML (experimental)

Abstract:Capturing long-range dependencies (LRD) efficiently is a core challenge in visual recognition, and state-space models (SSMs) have recently emerged as a promising alternative to self-attention for addressing it. However, adapting SSMs into CNN-based bottlenecks remains challenging, as existing approaches require complex pre-processing and multiple SSM replicas per block, limiting their practicality. We propose vGamba, a hybrid vision backbone that replaces the standard bottleneck convolution with a single lightweight SSM block, the Gamba cell, which incorporates 2D positional awareness and an attentive spatial context (ASC) module for efficient LRD modeling. Results on diverse downstream vision tasks demonstrate competitive accuracy against SSM-based models such as VMamba and ViM, while achieving significantly improved computation and memory efficiency over Bottleneck Transformer (BotNet). For example, at $2048 \times 2048$ resolution, vGamba is $2.07 \times$ faster than BotNet and reduces peak GPU memory by 93.8% (1.03GB vs. 16.78GB), scaling near-linearly with resolution comparable to ResNet-50. These results demonstrate that Gamba Bottleneck effectively overcomes the memory and compute constraints of BotNet global modeling, establishing it as a practical and scalable backbone for high-resolution vision tasks.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2503.21262 [cs.CV]

(or arXiv:2503.21262v3 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2503.21262

arXiv-issued DOI via DataCite

Submission history

From: Yunusa Haruna [view email] [v1] Thu, 27 Mar 2025 08:39:58 UTC (14,993 KB) [v2] Mon, 17 Nov 2025 13:12:41 UTC (12,902 KB) [v3] Mon, 30 Mar 2026 05:07:32 UTC (17,132 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
vGamba: Att…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 101 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers