Research Papers research paper arxiv computer-vision image-recognition

GradAttn: Replacing Fixed Residual Connections with Task-Modulated Attention Pathways

arXivby [Submitted on 23 Mar 2026]March 31, 20262 min read1 views

arXiv:2603.26756v1 Announce Type: new Abstract: Deep ConvNets suffer from gradient signal degradation as network depth increases, limiting effective feature learning in complex architectures. ResNet addressed this through residual connections, but these fixed short-circuits cannot adapt to varying input complexity or selectively emphasize task relevant features across network hierarchies. This study introduces GradAttn, a hybrid CNN-transformer framework that replaces fixed residual connections with attention-controlled gradient flow. By extracting multi-scale CNN features at different depths — Soudeep Ghoshal, Himanshu Buckchash

View PDF HTML (experimental)

Abstract:Deep ConvNets suffer from gradient signal degradation as network depth increases, limiting effective feature learning in complex architectures. ResNet addressed this through residual connections, but these fixed short-circuits cannot adapt to varying input complexity or selectively emphasize task relevant features across network hierarchies. This study introduces GradAttn, a hybrid CNN-transformer framework that replaces fixed residual connections with attention-controlled gradient flow. By extracting multi-scale CNN features at different depths and regulating them through self-attention, GradAttn dynamically weights shallow texture features and deep semantic representations. For representational analysis, we evaluated three GradAttn variants across eight diverse datasets, from natural images, medical imaging, to fashion recognition. Results demonstrate that GradAttn outperforms ResNet-18 on five of eight datasets, achieving up to +11.07% accuracy improvement on FashionMNIST while maintaining comparable network size. Gradient flow analysis reveals that controlled instabilities, introduced by attention, often coincide with improved generalization, challenging the assumption that perfect stability is optimal. Furthermore, positional encoding effectiveness proves dataset dependent, with CNN hierarchies frequently encoding sufficient spatial structure. These findings allow attention mechanisms as enablers of learnable gradient control, offering a new paradigm for adaptive representation learning in deep neural architectures.

Comments: 14 pages, 5 figures. Under review

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.26756 [cs.CV]

(or arXiv:2603.26756v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.26756

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Soudeep Ghoshal [view email] [v1] Mon, 23 Mar 2026 14:45:07 UTC (1,514 KB)

Original source

arXiv

https://arxiv.org/abs/2603.26756

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ModelsLive

RightNow AI Releases AutoKernel: An Open-Source Framework that Applies an Autonomous Agent Loop to GPU Kernel Optimization for Arbitrary PyTorch Models

Writing fast GPU code is one of the most grueling specializations in machine learning engineering. Researchers from RightNow AI want to automate it entirely. The RightNow AI research team has released AutoKernel, an open-source framework that applies an autonomous LLM agent loop to GPU kernel optimization for arbitrary PyTorch models. The approach is straightforward: give [ ] The post RightNow AI Releases AutoKernel: An Open-Source Framework that Applies an Autonomous Agent Loop to GPU Kernel Optimization for Arbitrary PyTorch Models appeared first on MarkTechPost .

MarkTechPost

1m5 minutes ago

ReleasesFresh

Simple parallel estimation of the partition ratio for Gibbs distributions

arXiv:2505.18324v2 Announce Type: replace-cross Abstract: We consider the problem of estimating the partition function $Z(\beta)=\sum_x \exp(\beta(H(x))$ of a Gibbs distribution with the Hamiltonian $H:\Omega\rightarrow\{0\}\cup[1,n]$. As shown in [Harris & Kolmogorov 2024], the log-ratio $q=\ln (Z(\beta_{\max})/Z(\beta_{\min}))$ can be estimated with accuracy $\epsilon$ using $O(\frac{q \log n}{\epsilon^2})$ calls to an oracle that produces a sample from the Gibbs distribution for parameter $\beta\in[\beta_{\min},\beta_{\max}]$. That algorithm is inherently sequential, or {\em adaptive}: the queried values of $\beta$ depend on previous samples. Recently, [Liu, Yin & Zhang 2024] developed a non-adaptive version that needs $O( q (\log^2 n) (\log q + \log \log n + \epsilon^{-2}) )$ samples.

arXiv cs.DS

2mabout 4 hours ago

Research PapersFresh

Online Graph Coloring for $k$-Colorable Graphs

arXiv:2511.16100v2 Announce Type: replace Abstract: We study the problem of online graph coloring for $k$-colorable graphs. The best previously known deterministic algorithm uses $\widetilde{O}(n^{1-\frac{1}{k!}})$ colors for general $k$ and $\widetilde{O}(n^{5/6})$ colors for $k = 4$, both given by Kierstead in 1998. In this paper, we finally break this barrier, achieving the first major improvement in nearly three decades. Our results are summarized as follows: (1) $k \geq 5$ case. We provide a deterministic online algorithm to color $k$-colorable graphs with $\widetilde{O}(n^{1-\frac{1}{k(k-1)/2}})$ colors, significantly improving the current upper bound of $\widetilde{O}(n^{1-\frac{1}{k!}})$ colors. Our algorithm also matches the best-known bound for $k = 4$ ($\widetilde{O}(n^{5/6})$ c

arXiv cs.DS

2mabout 4 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 311 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersFresh

Online Graph Coloring for $k$-Colorable Graphs

arXiv cs.DS

2mabout 4 hours ago

Research PapersFresh

Zero-Freeness of the Hard-Core Model with Bounded Connective Constant

arXiv:2604.02746v1 Announce Type: cross Abstract: We study the zero-free regions of the partition function of the hard-core model on finite graphs and their implications for the analyticity of the free energy on infinite lattices. Classically, zero-freeness results have been established up to the tree uniqueness threshold $\lambda_c(\Delta-1)$ determined by the maximum degree $\Delta$. However, for many graph classes, such as regular lattices, the connective constant $\sigma$ provides a more precise measure of structural complexity than the maximum degree. While recent approximation algorithms based on correlation decay and Markov chain Monte Carlo have successfully exploited the connective constant to improve the threshold to $\lambda_c(\sigma)$, analogous results for complex zero-freenes

arXiv cs.DS

2mabout 4 hours ago

Research PapersFresh

Stochastic Function Certification with Correlations

arXiv:2604.02611v1 Announce Type: new Abstract: We study the Stochastic Boolean Function Certification (SBFC) problem, where we are given $n$ Bernoulli random variables $\{X_e: e \in U\}$ on a ground set $U$ of $n$ elements with joint distribution $p$, a Boolean function $f: 2^U \to \{0, 1\}$, and an (unknown) scenario $S = \{e \in U: X_e = 1\}$ of active elements sampled from $p$. We seek to probe the elements one-at-a-time to reveal if they are active until we can certify $f(S) = 1$, while minimizing the expected number of probes. Unlike most previous results that assume independence, we study correlated distributions $p$ and give approximation algorithms for several classes of functions $f$. When $f(S)$ is the indicator function for whether $S$ is the spanning set of a given matroid, ou

arXiv cs.DS

2mabout 4 hours ago

Research PapersFresh

Non-Signaling Locality Lower Bounds for Dominating Set

arXiv:2604.02582v1 Announce Type: new Abstract: Minimum dominating set is a basic local covering problem and a core task in distributed computing. Despite extensive study, in the classic LOCAL model there exist significant gaps between known algorithms and lower bounds. Chang and Li prove an $\Omega(\log n)$-locality lower bound for a constant factor approximation, while Kuhn--Moscibroda--Wattenhofer gave an algorithm beating this bound beyond $\log \Delta$-approximation, along with a weaker lower bound for this degree-dependent setting scaling roughly with $\min\{\log \Delta/\log\log \Delta,\sqrt{\log n/\log\log n}\}$. Unfortunately, this latter bound is weak for small $\Delta$, and never recovers the Chang--Li bound, leaving central questions: does $O(\log \Delta)$-approximation require

arXiv cs.DS

2mabout 4 hours ago