Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessArm works with IBM to deliver flexibility on mainframe - Computer WeeklyGNews AI IBMI Couldn’t Debug My AI/ML GPU Incident - So I Built My Own ToolDEV CommunityPoisoned pixels, phishing, prompt injection: Cybersecurity threats in AI-driven radiology - healthcare-in-europe.comGNews AI healthcarePublic remains cautious about AI in healthcare - Healthcare TodayGNews AI healthcareIBM and Arm target AI systems that preserve mission-critical computing - Stock TitanGNews AI IBMI built Skill Flow — manage and deploy AI coding skills across every major agentDEV CommunityBuilding a Simple AI Video Tool After Veo 3.1 Lite APIs Became AvailableDEV CommunityI wrote the first book on building production MCP servers with GoDEV CommunityHow I Built a Playwright Pytest Automation Framework with an AI AssistantDEV Community3 Classifiers, 3 Answers: Why CoT Faithfulness Scores Are MeaninglessDEV CommunityWalletConnect + AI Agents: Mobile Approval for Autonomous TransactionsDEV CommunityIntelligence Dissolves PrivacyLessWrong AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessArm works with IBM to deliver flexibility on mainframe - Computer WeeklyGNews AI IBMI Couldn’t Debug My AI/ML GPU Incident - So I Built My Own ToolDEV CommunityPoisoned pixels, phishing, prompt injection: Cybersecurity threats in AI-driven radiology - healthcare-in-europe.comGNews AI healthcarePublic remains cautious about AI in healthcare - Healthcare TodayGNews AI healthcareIBM and Arm target AI systems that preserve mission-critical computing - Stock TitanGNews AI IBMI built Skill Flow — manage and deploy AI coding skills across every major agentDEV CommunityBuilding a Simple AI Video Tool After Veo 3.1 Lite APIs Became AvailableDEV CommunityI wrote the first book on building production MCP servers with GoDEV CommunityHow I Built a Playwright Pytest Automation Framework with an AI AssistantDEV Community3 Classifiers, 3 Answers: Why CoT Faithfulness Scores Are MeaninglessDEV CommunityWalletConnect + AI Agents: Mobile Approval for Autonomous TransactionsDEV CommunityIntelligence Dissolves PrivacyLessWrong AI

Improving Efficiency of GPU Kernel Optimization Agents using a Domain-Specific Language and Speed-of-Light Guidance

arXiv cs.LGby Siva Kumar Sastry Hari, Vignesh Balaji, Sana Damani, Qijing Huang, Christos KozyrakisApril 1, 20262 min read0 views
Source Quiz

arXiv:2603.29010v1 Announce Type: new Abstract: Optimizing GPU kernels with LLM agents is an iterative process over a large design space. Every candidate must be generated, compiled, validated, and profiled, so fewer trials will save both runtime and cost. We make two key observations. First, the abstraction level that agents operate at is important. If it is too low, the LLM wastes reasoning on low-impact details. If it is too high, it may miss important optimization choices. Second, agents cannot easily tell when they reach the point of diminishing returns, wasting resources as they continue searching. These observations motivate two design principles to improve efficiency: (1) a compact domain-specific language (DSL) that can be learned in context and lets the model reason at a higher l

View PDF HTML (experimental)

Abstract:Optimizing GPU kernels with LLM agents is an iterative process over a large design space. Every candidate must be generated, compiled, validated, and profiled, so fewer trials will save both runtime and cost. We make two key observations. First, the abstraction level that agents operate at is important. If it is too low, the LLM wastes reasoning on low-impact details. If it is too high, it may miss important optimization choices. Second, agents cannot easily tell when they reach the point of diminishing returns, wasting resources as they continue searching. These observations motivate two design principles to improve efficiency: (1) a compact domain-specific language (DSL) that can be learned in context and lets the model reason at a higher level while preserving important optimization levers, and (2) Speed-of-Light (SOL) guidance that uses first-principles performance bounds to steer and budget search. We implement these principles in $\mu$CUTLASS, a DSL with a compiler for CUTLASS-backed GPU kernels that covers kernel configuration, epilogue fusion, and multi-stage pipelines. We use SOL guidance to estimate headroom and guide optimization trials, deprioritize problems that are near SOL, and flag kernels that game the benchmark. On 59 KernelBench problems with the same iteration budgets, switching from generating low-level code to DSL code using GPT-5-mini turns a 0.40x geomean regression into a 1.27x speedup over PyTorch. Adding SOL-guided steering raises this to 1.56x. Across model tiers, $\mu$CUTLASS + SOL-guidance lets weaker models outperform stronger baseline agents at lower token cost. SOL-guided budgeting saves 19-43% of tokens while retaining at least 95% of geomean speedup, with the best policy reaching a 1.68x efficiency gain. Lastly, SOL analysis helps detect benchmark-gaming cases, where kernels may appear fast while failing to perform the intended computation.

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.29010 [cs.LG]

(or arXiv:2603.29010v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.29010

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Siva Kumar Sastry Hari [view email] [v1] Mon, 30 Mar 2026 21:16:39 UTC (6,047 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelbenchmarkannounce

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Improving E…modelbenchmarkannounceanalysispolicyreasoningarXiv cs.LG

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 122 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models