Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessTennibot launches Partner V2, its latest robotic tennis ball machineThe Robot Report[D] How's MLX and jax/ pytorch on MacBooks these days?Reddit r/MachineLearningWhich Artificial Intelligence (AI) Supercycle Stock Will Make You Richer Over the Next 10 Years? - The Motley FoolGoogle News: AIOpenAI policy blueprint sparks AI regulation debate - Fox BusinessGNews AI regulationAnthropic Claude AI training model targets AI skills gap | ETIH EdTech News - EdTech Innovation HubGoogle News: ClaudeSamsung flags eightfold jump in Q1 profit as AI chip demand drives up prices - ReutersGNews AI SamsungCNBC s The China Connection newsletter: Why AI isn t replacing jobs in China (yet)CNBC TechnologyA top US shipbuilder is exploring how AI and robots can do some of the hardest jobs on the production floorBusiness InsiderAnonymous Sources Detail Sam Altman’s Alleged Untrustworthiness in New ReportGizmodoSamsung Profit Up Eight-Fold After AI Chip Sales Defy War FearsBloomberg TechnologySamsung Profit Up Eight-Fold After AI Chip Sales Defy War Fears - Bloomberg.comGNews AI SamsungSamsung flags eight-fold jump in Q1 profit as AI chip demand drives up prices - finance.yahoo.comGNews AI SamsungBlack Hat USADark ReadingBlack Hat AsiaAI BusinessTennibot launches Partner V2, its latest robotic tennis ball machineThe Robot Report[D] How's MLX and jax/ pytorch on MacBooks these days?Reddit r/MachineLearningWhich Artificial Intelligence (AI) Supercycle Stock Will Make You Richer Over the Next 10 Years? - The Motley FoolGoogle News: AIOpenAI policy blueprint sparks AI regulation debate - Fox BusinessGNews AI regulationAnthropic Claude AI training model targets AI skills gap | ETIH EdTech News - EdTech Innovation HubGoogle News: ClaudeSamsung flags eightfold jump in Q1 profit as AI chip demand drives up prices - ReutersGNews AI SamsungCNBC s The China Connection newsletter: Why AI isn t replacing jobs in China (yet)CNBC TechnologyA top US shipbuilder is exploring how AI and robots can do some of the hardest jobs on the production floorBusiness InsiderAnonymous Sources Detail Sam Altman’s Alleged Untrustworthiness in New ReportGizmodoSamsung Profit Up Eight-Fold After AI Chip Sales Defy War FearsBloomberg TechnologySamsung Profit Up Eight-Fold After AI Chip Sales Defy War Fears - Bloomberg.comGNews AI SamsungSamsung flags eight-fold jump in Q1 profit as AI chip demand drives up prices - finance.yahoo.comGNews AI Samsung
AI NEWS HUBbyEIGENVECTOREigenvector

Efficient Bilevel Optimization with KFAC-Based Hypergradients

arXiv cs.LGby Disen Liao, Felix Dangel, Yaoliang YuApril 1, 20261 min read0 views
Source Quiz

arXiv:2603.29108v1 Announce Type: new Abstract: Bilevel optimization (BO) is widely applicable to many machine learning problems. Scaling BO, however, requires repeatedly computing hypergradients, which involves solving inverse Hessian-vector products (IHVPs). In practice, these operations are often approximated using crude surrogates such as one-step gradient unrolling or identity/short Neumann expansions, which discard curvature information. We build on implicit function theorem-based algorithms and propose to incorporate Kronecker-factored approximate curvature (KFAC), yielding curvature-aware hypergradients with a better performance efficiency trade-off than Conjugate Gradient (CG) or Neumann methods and consistently outperforming unrolling. We evaluate this approach across diverse tas

View PDF HTML (experimental)

Abstract:Bilevel optimization (BO) is widely applicable to many machine learning problems. Scaling BO, however, requires repeatedly computing hypergradients, which involves solving inverse Hessian-vector products (IHVPs). In practice, these operations are often approximated using crude surrogates such as one-step gradient unrolling or identity/short Neumann expansions, which discard curvature information. We build on implicit function theorem-based algorithms and propose to incorporate Kronecker-factored approximate curvature (KFAC), yielding curvature-aware hypergradients with a better performance efficiency trade-off than Conjugate Gradient (CG) or Neumann methods and consistently outperforming unrolling. We evaluate this approach across diverse tasks, including meta-learning and AI safety problems. On models up to BERT, we show that curvature information is valuable at scale, and KFAC can provide it with only modest memory and runtime overhead. Our implementation is available at this https URL.

Comments: 25 pages, AISTATS 2026

Subjects:

Machine Learning (cs.LG)

Cite as: arXiv:2603.29108 [cs.LG]

(or arXiv:2603.29108v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.29108

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Disen Liao [view email] [v1] Tue, 31 Mar 2026 00:54:31 UTC (2,262 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelannounceavailable

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Efficient B…modelannounceavailableproductsafetymeta-learni…arXiv cs.LG

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 177 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Releases