Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessDryft: What if AI memory worked like an ecosystem instead of a filing cabinet?DEV CommunityWeb Scraping Tools Comparison 2026: requests vs curl_cffi vs Playwright vs ScrapyDEV CommunityQualcomm Joins Korea's 'Challenge AX' Program to Support AI Startups - thelec.netGNews AI KoreaAI Is Turning Film Pitches into Proof—But Korea’s Financing Model Still Lags - KoreaTechDeskGNews AI KoreaFrom Next.js to Pareto: What Changes and What Stays the SameDEV CommunityA Quick Note on Gemma 4 Image Settings in Llama.cppDEV CommunityDoes consciousness and suffering even matter: LLMs and moral relevancelesswrong.comHow to Parse HL7 Messages with AI — Free MCP ServerDEV CommunityGHSA-QCC3-JQWP-5VH2: GHSA-qcc3-jqwp-5vh2: Unauthenticated Resource Exhaustion via LINE Webhook Handler in OpenClawDEV CommunityHow to Hyper-Personalization in Action: From Story Angle to Ranked Media List in MinutesDEV CommunityHow to Scrape DoorDash, Uber Eats, and Grubhub Menu Data in 2026DEV CommunityReverse Engineering Cloudflare's React-Based Bot Detection in 2026DEV CommunityBlack Hat USADark ReadingBlack Hat AsiaAI BusinessDryft: What if AI memory worked like an ecosystem instead of a filing cabinet?DEV CommunityWeb Scraping Tools Comparison 2026: requests vs curl_cffi vs Playwright vs ScrapyDEV CommunityQualcomm Joins Korea's 'Challenge AX' Program to Support AI Startups - thelec.netGNews AI KoreaAI Is Turning Film Pitches into Proof—But Korea’s Financing Model Still Lags - KoreaTechDeskGNews AI KoreaFrom Next.js to Pareto: What Changes and What Stays the SameDEV CommunityA Quick Note on Gemma 4 Image Settings in Llama.cppDEV CommunityDoes consciousness and suffering even matter: LLMs and moral relevancelesswrong.comHow to Parse HL7 Messages with AI — Free MCP ServerDEV CommunityGHSA-QCC3-JQWP-5VH2: GHSA-qcc3-jqwp-5vh2: Unauthenticated Resource Exhaustion via LINE Webhook Handler in OpenClawDEV CommunityHow to Hyper-Personalization in Action: From Story Angle to Ranked Media List in MinutesDEV CommunityHow to Scrape DoorDash, Uber Eats, and Grubhub Menu Data in 2026DEV CommunityReverse Engineering Cloudflare's React-Based Bot Detection in 2026DEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.28342v1 Announce Type: cross Abstract: We present Kernel-Smith, a framework for high-performance GPU kernel and operator generation that combines a stable evaluation-driven evolutionary agent with an evolution-oriented post-training recipe. On the agent side, Kernel-Smith maintains a population of executable candidates and iteratively improves them using an archive of top-performing and diverse programs together with structured execution feedback on compilation, correctness, and speedup. To make this search reliable, we build backend-specific evaluation services for Triton on NVIDIA — He Du, Qiming Ge, Jiakai Hu, Aijun Yang, Zheng Cai, Zixian Huang, Sheng Yuan, Qinxiu Cheng, Xinchen Xie, Yicheng Chen, Yining Li, Jiaxing Xie, Huanan Dong, Yaguang Wu, Xiangjun Huang, Jian Yang, Hui Wang, Bowen Zhou, Bowen Li, Qipeng Guo, Kai Chen

Authors:He Du, Qiming Ge, Jiakai Hu, Aijun Yang, Zheng Cai, Zixian Huang, Sheng Yuan, Qinxiu Cheng, Xinchen Xie, Yicheng Chen, Yining Li, Jiaxing Xie, Huanan Dong, Yaguang Wu, Xiangjun Huang, Jian Yang, Hui Wang, Bowen Zhou, Bowen Li, Qipeng Guo, Kai Chen

View PDF HTML (experimental)

Abstract:We present Kernel-Smith, a framework for high-performance GPU kernel and operator generation that combines a stable evaluation-driven evolutionary agent with an evolution-oriented post-training recipe. On the agent side, Kernel-Smith maintains a population of executable candidates and iteratively improves them using an archive of top-performing and diverse programs together with structured execution feedback on compilation, correctness, and speedup. To make this search reliable, we build backend-specific evaluation services for Triton on NVIDIA GPUs and Maca on MetaX GPUs. On the training side, we convert long-horizon evolution trajectories into step-centric supervision and reinforcement learning signals by retaining correctness-preserving, high-gain revisions, so that the model is optimized as a strong local improver inside the evolutionary loop rather than as a one-shot generator. Under a unified evolutionary protocol, Kernel-Smith-235B-RL achieves state-of-the-art overall performance on KernelBench with Nvidia Triton backend, attaining the best average speedup ratio and outperforming frontier proprietary models including Gemini-3.0-pro and Claude-4.6-opus. We further validate the framework on the MetaX MACA backend, where our Kernel-Smith-MACA-30B surpasses large-scale counterparts such as DeepSeek-V3.2-think and Qwen3-235B-2507-think, highlighting potential for seamless adaptation across heterogeneous platforms. Beyond benchmark results, the same workflow produces upstream contributions to production systems including SGLang and LMDeploy, demonstrating that LLM-driven kernel optimization can transfer from controlled evaluation to practical deployment.

Subjects:

Computation and Language (cs.CL); Machine Learning (cs.LG)

Cite as: arXiv:2603.28342 [cs.CL]

(or arXiv:2603.28342v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.28342

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: He Du [view email] [v1] Mon, 30 Mar 2026 12:12:49 UTC (994 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Kernel-Smit…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 162 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers