Understanding SAM's Robustness to Noisy Labels through Gradient Down-weighting
arXiv:2411.17132v2 Announce Type: replace Abstract: Sharpness-Aware Minimization (SAM) was introduced to improve generalization by seeking flat minima, yet it also exhibits robustness to label noise, a phenomenon that remains only partially understood. Prior work has mainly attributed this effect to SAM's tendency to prolong the learning of clean samples. In this work, we provide a complementary explanation by analyzing SAM at the element-wise level. We show that when noisy gradients dominate a parameter direction, their influence is reduced by the stronger amplification of clean gradients. Th — Hoang-Chau Luong, Quang-Thuc Nguyen, Dat Ba Tran, Minh-Triet Tran
View PDF HTML (experimental)
Abstract:Sharpness-Aware Minimization (SAM) was introduced to improve generalization by seeking flat minima, yet it also exhibits robustness to label noise, a phenomenon that remains only partially understood. Prior work has mainly attributed this effect to SAM's tendency to prolong the learning of clean samples. In this work, we provide a complementary explanation by analyzing SAM at the element-wise level. We show that when noisy gradients dominate a parameter direction, their influence is reduced by the stronger amplification of clean gradients. This slows the memorization of noisy labels while sustaining clean learning, offering a more complete account of SAM's robustness. Building on this insight, we propose SANER (Sharpness-Aware Noise-Explicit Reweighting), a simple variant of SAM that explicitly magnifies this down-weighting effect. Experiments on benchmark image classification tasks with noisy labels demonstrate that SANER significantly mitigates noisy-label memorization and improves generalization over both SAM and SGD. Moreover, since SANER is designed from the mechanism of SAM, it can also be seamlessly integrated into SAM-like variants, further boosting their robustness.
Subjects:
Machine Learning (cs.LG)
Cite as: arXiv:2411.17132 [cs.LG]
(or arXiv:2411.17132v2 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2411.17132
arXiv-issued DOI via DataCite
Submission history
From: Hoang-Chau Luong [view email] [v1] Tue, 26 Nov 2024 05:54:12 UTC (849 KB) [v2] Mon, 30 Mar 2026 17:14:51 UTC (282 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
Scaling Agentic Memory to 5 Billion Vectors via Binary Quantization and Dynamic Wavelet Matrices
In a study, a new “dynamic wavelet matrix” was used as a vector database, where the memory grows only with log(σ) instead of with n. I considered building a KNN model with a huge memory, capable of holding, for example, 5 billion vectors. First, the words in the context window are converted into an embedding using deberta-v3-small. This is a fast encoder that also takes the position of the tokens into account (disentangled attention) and is responsible for the context in the model. The embedding is then converted into a bit sequence using binary quantization, where dimensions greater than 0 are converted to 1 and otherwise to 0. The advantage is that bit sequences are compressible and are entered into the dynamic wavelet matrix, where the memory grows only with log(σ). A response token is
![[D] ICML reviewer making up false claim in acknowledgement, what to do?](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-matrix-rain-CvjLrWJiXfamUnvj5xT9J9.webp)
[D] ICML reviewer making up false claim in acknowledgement, what to do?
In a rebuttal acknowledgement we received, the reviewer made up a claim that our method performs worse than baselines with some hyperparameter settings. We did do a comprehensive list of hyperparameter comparisons and the reviewer's claim is not supported by what's presented in the paper. In this case what can we do? submitted by /u/dontknowwhattoplay [link] [comments]
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
![[D] ICML reviewer making up false claim in acknowledgement, what to do?](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-matrix-rain-CvjLrWJiXfamUnvj5xT9J9.webp)
[D] ICML reviewer making up false claim in acknowledgement, what to do?
In a rebuttal acknowledgement we received, the reviewer made up a claim that our method performs worse than baselines with some hyperparameter settings. We did do a comprehensive list of hyperparameter comparisons and the reviewer's claim is not supported by what's presented in the paper. In this case what can we do? submitted by /u/dontknowwhattoplay [link] [comments]


Researchers 3D print robot the size of a single-cell organism — devices move and navigate even without a ‘brain,’ uses their shape and the environment to get going
Researchers 3D print robot the size of a single-cell organism — devices move and navigate even without a ‘brain,’ uses their shape and the environment to get going



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!