Improving Efficiency of GPU Kernel Optimization Agents using a Domain-Specific Language and Speed-of-Light Guidance
arXiv:2603.29010v1 Announce Type: new Abstract: Optimizing GPU kernels with LLM agents is an iterative process over a large design space. Every candidate must be generated, compiled, validated, and profiled, so fewer trials will save both runtime and cost. We make two key observations. First, the abstraction level that agents operate at is important. If it is too low, the LLM wastes reasoning on low-impact details. If it is too high, it may miss important optimization choices. Second, agents cannot easily tell when they reach the point of diminishing returns, wasting resources as they continue searching. These observations motivate two design principles to improve efficiency: (1) a compact domain-specific language (DSL) that can be learned in context and lets the model reason at a higher l
View PDF HTML (experimental)
Abstract:Optimizing GPU kernels with LLM agents is an iterative process over a large design space. Every candidate must be generated, compiled, validated, and profiled, so fewer trials will save both runtime and cost. We make two key observations. First, the abstraction level that agents operate at is important. If it is too low, the LLM wastes reasoning on low-impact details. If it is too high, it may miss important optimization choices. Second, agents cannot easily tell when they reach the point of diminishing returns, wasting resources as they continue searching. These observations motivate two design principles to improve efficiency: (1) a compact domain-specific language (DSL) that can be learned in context and lets the model reason at a higher level while preserving important optimization levers, and (2) Speed-of-Light (SOL) guidance that uses first-principles performance bounds to steer and budget search. We implement these principles in $\mu$CUTLASS, a DSL with a compiler for CUTLASS-backed GPU kernels that covers kernel configuration, epilogue fusion, and multi-stage pipelines. We use SOL guidance to estimate headroom and guide optimization trials, deprioritize problems that are near SOL, and flag kernels that game the benchmark. On 59 KernelBench problems with the same iteration budgets, switching from generating low-level code to DSL code using GPT-5-mini turns a 0.40x geomean regression into a 1.27x speedup over PyTorch. Adding SOL-guided steering raises this to 1.56x. Across model tiers, $\mu$CUTLASS + SOL-guidance lets weaker models outperform stronger baseline agents at lower token cost. SOL-guided budgeting saves 19-43% of tokens while retaining at least 95% of geomean speedup, with the best policy reaching a 1.68x efficiency gain. Lastly, SOL analysis helps detect benchmark-gaming cases, where kernels may appear fast while failing to perform the intended computation.
Subjects:
Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as: arXiv:2603.29010 [cs.LG]
(or arXiv:2603.29010v1 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2603.29010
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Siva Kumar Sastry Hari [view email] [v1] Mon, 30 Mar 2026 21:16:39 UTC (6,047 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelbenchmarkannounceIntelligence Dissolves Privacy
The future is going to be different from the present. Let's think about how. Specifically, our expectations about what's reasonable are downstream of our past experiences, and those experiences were downstream of our options (and the options other people in our society had). As those options change, so too our experiences, and our expectations of what's reasonable. I once thought it was reasonable to pick up the phone and call someone, and to pick up my phone when it rang; things have changed, and someone thinking about what's possible could have seen it coming. So let's try to see more things coming, and maybe that will give us the ability to choose what it will actually look like. I think lots of people's intuitions and expectations about "privacy" will be violated, as technology develop
Anthropic Just Mapped the Jobs AI Is Replacing First - Here's What the Data Actually Says
<p>Anthropic published something last month that didn't get nearly enough attention: a detailed map of which white-collar jobs AI is most likely to displace first.</p> <p>Software engineers. Financial analysts. Lawyers. Accountants. Marketing managers. HR specialists. Middle management.</p> <p>If your job involves sitting at a computer processing information, writing things, or analyzing data - you're on the list.</p> <h2> The Numbers Are Not Comforting </h2> <p>A survey of 2,500 white-collar tech workers found <strong>61% believe AI will replace their current role within three years</strong>. Not eventually. Three years.</p> <p>Goldman Sachs puts the global figure at <strong>300 million jobs at risk</strong>.</p> <p>In February, AI executive Matt Shumer published an essay on X comparing t
Automate your Creem payments with this OpenClaw Agent
<p>If you run a subscription-based SaaS business, you usually learn about problems too late. A payment fails, a customer churns, or a dispute appears, and nobody sees it until hours later.</p> <p>This openclaw agent is built to close that gap. It listens to Creem webhooks in real time, sends clear alerts, analyzes churn risk, and can even execute retention actions when the policy says it is safe.</p> <h2> The Big Picture </h2> <p><a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32y2774xv380famrrpij.png" class="article-body-image-wrapper"><img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models
Google AI educator training series expands digital skills push across K-12 and higher education - EdTech Innovation Hub
<a href="https://news.google.com/rss/articles/CBMie0FVX3lxTFBQTVFQNE91MHp2bEF1QlE5QlNLQ0daRjFHZVdzT09iOUpxNUZHbDEtWW9ybHdaYmFSbmUzbk1ReHBDS2FSZkpnMXVkeGQ4SEVMOG5WbnNNRUtvYjdiVDdJY1FUZ2pVTC05QUYxRkQwWUh5M1Z4aEpJLUtmcw?oc=5" target="_blank">Google AI educator training series expands digital skills push across K-12 and higher education</a> <font color="#6f6f6f">EdTech Innovation Hub</font>
3 Classifiers, 3 Answers: Why CoT Faithfulness Scores Are Meaningless
<h1> 3 Classifiers, 3 Answers: Why CoT Faithfulness Scores Are Meaningless </h1> <p>LLM Chain-of-Thought (CoT) — the mechanism where models output their reasoning process as text before answering — has been treated as a window into model thinking. The question of whether CoT actually reflects internal reasoning (faithfulness) has attracted serious research. Numbers like "DeepSeek-R1 acknowledges hints 39% of the time" circulate as if they're objective measurements.</p> <p>But can you trust those numbers?</p> <p>A March 2026 ArXiv paper (Young, 2026) demolished this assumption. Apply three different classifiers to the same data and faithfulness scores come out at 74.4%, 82.6%, and 69.7%. A 13-point spread. Statistically significant — 95% confidence intervals don't overlap.</p> <p>The more s
Anthropic Races to Contain Leak of Code Behind Claude AI Agent - WSJ
<a href="https://news.google.com/rss/articles/CBMipgNBVV95cUxNd2Z0TkxScHVGWm91MC1xUlBnN2hycFNkOGRJZHVjUElPNTdHU2NIODNSRFMxVlRpSkpjUlhOY29zVEtTVTlWbDhFM0dmS2Q0NkJWcEVGbndoOTZHelRScVFJd180WURVVG9hemdOck1FOXdPZ3A5LTlqMmdHMFdVVjRSaGhMM2RMd0R4NDBXY055Ni1qY3FZdTB4bU5zNGNOMnhfZXRoQXBuZjkyWG90bXE5am1rMmIzbTRCbmsyMjg0LXRXSWppeWJnbTJPSGVKWXIxWmlUMEJmR2d1VUxWcUMxdjctdEFLN3dpYlRhWHlwVFh3WXg1V2ozUl9kNl9rcmM2bk9INTNmMjVvMlR4OVFYQXpIYVk0STVlaWx4VkVKZGtvZV9ERU0xQTNxLXFNTXpjYS0tX2FRSlNqbmt2bi1wMndtdjlvdXlRT2Nqb3FqbHNjNS1rOUV6NXNvQjdsdG1LOWdHUTEyNnNqbWtiRktoRl9Nbm1HVHBWbm9DaVpFSWQwUmNjQnZhMW1Nc2JWQjVjT2NSTkRaWHhfWDhzTFFocmZfQQ?oc=5" target="_blank">Anthropic Races to Contain Leak of Code Behind Claude AI Agent</a> <font color="#6f6f6f">WSJ</font>
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!