Live
Black Hat USADark ReadingBlack Hat AsiaAI Business5 AI-powered consulting startups to watchBusiness InsiderWhat Teens Are Doing With Those Role-Playing Chatbots - The New York TimesGoogle News: AIOCSF explained: The shared data language security teams have been missingVentureBeat AIdark ilanlesswrong.comMicrosoft Is Going Multi-Model with Copilot. Does the Enterprise King Win Again? - The Motley FoolGNews AI MicrosoftShow HN: Running local OpenClaw together with remote agents in an open networkHacker NewsA folk musician became a target for AI fakes and a copyright trollThe Verge AIWhat Teens Are Doing With Those Role-Playing ChatbotsNYT TechnologyChicken-Free Egg Whiteslesswrong.comDesktop Canary v2.1.48-canary.35LobeChat ReleasesPlease someone recommend me a good model for Linux Mint + 12 GB RAM + 3 GB VRAM + GTX 1050 setup.Reddit r/LocalLLaMABest Artificial Intelligence Stocks To Add to Your Watchlist - April 4th - MarketBeatGoogle News: AIBlack Hat USADark ReadingBlack Hat AsiaAI Business5 AI-powered consulting startups to watchBusiness InsiderWhat Teens Are Doing With Those Role-Playing Chatbots - The New York TimesGoogle News: AIOCSF explained: The shared data language security teams have been missingVentureBeat AIdark ilanlesswrong.comMicrosoft Is Going Multi-Model with Copilot. Does the Enterprise King Win Again? - The Motley FoolGNews AI MicrosoftShow HN: Running local OpenClaw together with remote agents in an open networkHacker NewsA folk musician became a target for AI fakes and a copyright trollThe Verge AIWhat Teens Are Doing With Those Role-Playing ChatbotsNYT TechnologyChicken-Free Egg Whiteslesswrong.comDesktop Canary v2.1.48-canary.35LobeChat ReleasesPlease someone recommend me a good model for Linux Mint + 12 GB RAM + 3 GB VRAM + GTX 1050 setup.Reddit r/LocalLLaMABest Artificial Intelligence Stocks To Add to Your Watchlist - April 4th - MarketBeatGoogle News: AI
AI NEWS HUBbyEIGENVECTOREigenvector

To Augment or Not to Augment? Diagnosing Distributional Symmetry Breaking

arXivby [Submitted on 1 Oct 2025 (v1), last revised 30 Mar 2026 (this version, v2)]March 31, 20262 min read2 views
Source Quiz
🧒Explain Like I'm 5Simple language

Hey there, little explorer! Imagine you have a toy car. 🚗

Sometimes, grown-ups teach computers to learn about things, like your car. They show it pictures of the car from the front.

But what if the car is sideways or upside down? The computer might get confused!

So, smart grown-ups try to help the computer by showing it lots of pictures: the car from the front, the side, and even upside down! This is like making copies of your toy in different positions. They call this "augmenting."

This paper is like a detective story! 🕵️‍♀️ It asks: "Is it always a good idea to show the computer all those different pictures?" Sometimes, if the computer never sees the car upside down in real life, showing it upside-down pictures might actually confuse it more!

The detectives found that sometimes, showing too many weird pictures can make the computer not learn as well. It's like teaching a puppy to sit, but then also teaching it to stand on its head when it should just be sitting! 🐶

They made a special game to figure out when to show all the different pictures, and when it's better to just stick to the normal ones. It helps computers learn smarter! ✨

arXiv:2510.01349v2 Announce Type: replace Abstract: Symmetry-aware methods for machine learning, such as data augmentation and equivariant architectures, encourage correct model behavior on all transformations (e.g. rotations or permutations) of the original dataset. These methods can improve generalization and sample efficiency, under the assumption that the transformed datapoints are highly probable, or "important", under the test distribution. In this work, we develop a method for critically evaluating this assumption. In particular, we propose a metric to quantify the amount of symmetry br — Hannah Lawrence, Elyssa Hofgard, Vasco Portilheiro, Yuxuan Chen, Tess Smidt, Robin Walters

View PDF HTML (experimental)

Abstract:Symmetry-aware methods for machine learning, such as data augmentation and equivariant architectures, encourage correct model behavior on all transformations (e.g. rotations or permutations) of the original dataset. These methods can improve generalization and sample efficiency, under the assumption that the transformed datapoints are highly probable, or "important", under the test distribution. In this work, we develop a method for critically evaluating this assumption. In particular, we propose a metric to quantify the amount of symmetry breaking in a dataset, via a two-sample classifier test that distinguishes between the original dataset and its randomly augmented equivalent. We validate our metric on synthetic datasets, and then use it to uncover surprisingly high degrees of symmetry-breaking in several benchmark point cloud datasets, constituting a severe form of dataset bias. We show theoretically that distributional symmetry-breaking can prevent invariant methods from performing optimally even when the underlying labels are truly invariant, for invariant ridge regression in the infinite feature limit. Empirically, the implication for symmetry-aware methods is dataset-dependent: equivariant methods still impart benefits on some symmetry-biased datasets, but not others, particularly when the symmetry bias is predictive of the labels. Overall, these findings suggest that understanding equivariance -- both when it works, and why -- may require rethinking symmetry biases in the data.

Comments: Published as a conference paper at ICLR 2026. A short version of this paper appeared at the ICLR AI4Mat workshop in April 2025

Subjects:

Machine Learning (cs.LG); Machine Learning (stat.ML)

Cite as: arXiv:2510.01349 [cs.LG]

(or arXiv:2510.01349v2 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2510.01349

arXiv-issued DOI via DataCite

Submission history

From: Elyssa Hofgard [view email] [v1] Wed, 1 Oct 2025 18:26:33 UTC (9,649 KB) [v2] Mon, 30 Mar 2026 17:52:45 UTC (10,220 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
To Augment …researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 147 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers