Efficient Bilevel Optimization with KFAC-Based Hypergradients
arXiv:2603.29108v1 Announce Type: new Abstract: Bilevel optimization (BO) is widely applicable to many machine learning problems. Scaling BO, however, requires repeatedly computing hypergradients, which involves solving inverse Hessian-vector products (IHVPs). In practice, these operations are often approximated using crude surrogates such as one-step gradient unrolling or identity/short Neumann expansions, which discard curvature information. We build on implicit function theorem-based algorithms and propose to incorporate Kronecker-factored approximate curvature (KFAC), yielding curvature-aware hypergradients with a better performance efficiency trade-off than Conjugate Gradient (CG) or Neumann methods and consistently outperforming unrolling. We evaluate this approach across diverse tas
View PDF HTML (experimental)
Abstract:Bilevel optimization (BO) is widely applicable to many machine learning problems. Scaling BO, however, requires repeatedly computing hypergradients, which involves solving inverse Hessian-vector products (IHVPs). In practice, these operations are often approximated using crude surrogates such as one-step gradient unrolling or identity/short Neumann expansions, which discard curvature information. We build on implicit function theorem-based algorithms and propose to incorporate Kronecker-factored approximate curvature (KFAC), yielding curvature-aware hypergradients with a better performance efficiency trade-off than Conjugate Gradient (CG) or Neumann methods and consistently outperforming unrolling. We evaluate this approach across diverse tasks, including meta-learning and AI safety problems. On models up to BERT, we show that curvature information is valuable at scale, and KFAC can provide it with only modest memory and runtime overhead. Our implementation is available at this https URL.
Comments: 25 pages, AISTATS 2026
Subjects:
Machine Learning (cs.LG)
Cite as: arXiv:2603.29108 [cs.LG]
(or arXiv:2603.29108v1 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2603.29108
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Disen Liao [view email] [v1] Tue, 31 Mar 2026 00:54:31 UTC (2,262 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelannounceavailable
Google’s free offline dictation app just made paying $15 a month for Wispr Flow hard to justify
In short: Google has quietly released an iOS app called Google AI Edge Eloquent, a free, offline-first voice dictation tool that transcribes speech in real time, strips filler words automatically, and transforms raw dictation into polished text without requiring an internet connection. The app runs on Gemma-based on-device ASR models, offers an optional cloud mode using [ ] This story continues at The Next Web

Willitrun: benchmark-backed CLI to check whether ML models fit/run on your hardware
I built willitrun, a small CLI that tries to answer a question I kept running into with local/edge ML: will this model actually fit and run on my hardware? It uses benchmark data when available and falls back to lightweight estimation otherwise. One thing I wanted from the start was support for Hugging Face model IDs directly , so you can point the tool at a model from the Hub instead of manually entering all metadata yourself. The goal right now is not to be perfect, but to be useful enough to filter out obviously bad choices before spending time downloading or testing models manually. GitHub: GitHub - smoothyy3/willitrun: CLI to tell you if an ML model will fit and run on your device, using real benchmarks + lightweight estimation. · GitHub PyPI: willitrun · PyPI It is still early, and I

Sources: OpenAI, Anthropic, and Google are sharing information via the Frontier Model Forum to detect adversarial distillation attempts that violate their ToS (Bloomberg)
Bloomberg : Sources: OpenAI, Anthropic, and Google are sharing information via the Frontier Model Forum to detect adversarial distillation attempts that violate their ToS Rivals OpenAI, Anthropic PBC, and Alphabet Inc.'s Google have begun working together to try to clamp down on Chinese competitors extracting results
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Releases

Iran threatens to destroy OpenAI’s $30bn Stargate data centre in Abu Dhabi
In short: Iran’s Islamic Revolutionary Guard Corps has released a video threatening “complete and utter annihilation” of OpenAI’s $30bn Stargate AI campus in Abu Dhabi, singling out the facility by name for the first time and warning it will strike if the US proceeds with threatened attacks on Iranian civilian infrastructure. A senior officer in Iran’s [ ] This story continues at The Next Web

Filing: Broadcom agrees to produce future versions of Google s TPUs and expands its Anthropic deal to give the startup access to ~3.5 GW of computing capacity (Jordan Novet/CNBC)
Jordan Novet / CNBC : Filing: Broadcom agrees to produce future versions of Google's TPUs and expands its Anthropic deal to give the startup access to ~3.5 GW of computing capacity - Broadcom said it agreed to produce future versions of Google's artificial intelligence chips,

Tennibot launches Partner V2, its latest robotic tennis ball machine
Attendees at the Robotics Summit Expo in Boston will get a chance to interact with Tennibot's technology firsthand. The post Tennibot launches Partner V2, its latest robotic tennis ball machine appeared first on The Robot Report .


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!