RTLSeek: Boosting the LLM-Based RTL Generation with Multi-Stage Diversity-Oriented Reinforcement Learning
arXiv:2603.27630v1 Announce Type: cross Abstract: Register Transfer Level (RTL) design translates high-level specifications into hardware using HDLs such as Verilog. Although LLM-based RTL generation is promising, the scarcity of functionally verifiable high-quality data limits both accuracy and diversity. Existing post-training typically produces a single HDL implementation per specification, lacking awareness of RTL variations needed for different design goals. We propose RTLSeek, a post-training paradigm that applies rule-based Diversity-Oriented Reinforcement Learning to improve RTL correc — Xinyu Zhang, Zhiteng Chao, Yonghao Wang, Bin Sun, Tianyun Ma, Tianmeng Yang, Jianan Mu, Jing Justin Ye, Huawei Li
View PDF HTML (experimental)
Abstract:Register Transfer Level (RTL) design translates high-level specifications into hardware using HDLs such as Verilog. Although LLM-based RTL generation is promising, the scarcity of functionally verifiable high-quality data limits both accuracy and diversity. Existing post-training typically produces a single HDL implementation per specification, lacking awareness of RTL variations needed for different design goals. We propose RTLSeek, a post-training paradigm that applies rule-based Diversity-Oriented Reinforcement Learning to improve RTL correctness and diversity. Our Diversity-Centric Multi-Objective Reward Scheduling integrates expert knowledge with EDA feedback, and a three-stage framework maximizes the utility of limited data. Experiments on the RTLLM benchmark show that RTLSeek surpasses prior methods, with ablation results confirming that encouraging broader design-space exploration improves RTL quality and achieves the principle of "the more generated, the better results." Implementation framework, including the dataset, source code, and model weights, is shown at this https URL.
Comments: 8 pages, 6 figures
Subjects:
Hardware Architecture (cs.AR); Machine Learning (cs.LG)
Cite as: arXiv:2603.27630 [cs.AR]
(or arXiv:2603.27630v1 [cs.AR] for this version)
https://doi.org/10.48550/arXiv.2603.27630
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Yonghao Wang [view email] [v1] Sun, 29 Mar 2026 11:01:02 UTC (460 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivKnowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Beyond Metadata: Multimodal, Policy-Aware Detection of YouTube Scam Videos
arXiv:2509.23418v2 Announce Type: replace Abstract: YouTube is a major platform for information and entertainment, but its wide accessibility also makes it attractive for scammers to upload deceptive or malicious content. Prior detection approaches rely largely on textual or statistical metadata, such as titles, descriptions, view counts, or likes, which are effective in many cases but can be evaded through benign-looking text, manipulated statistics, or other obfuscation strategies (e.g., 'Leetspeak'), while ignoring visual cues. In this study, we systematically investigate multimodal approaches for detecting YouTube scams. Our dataset consolidates established scam categories and augments them with full-length videos and policy-grounded reasoning annotations. Experiments show that a text-

Online Flow Time Minimization: Tight Bounds for Non-Preemptive Algorithms
arXiv:2511.03485v3 Announce Type: replace Abstract: This paper studies the online scheduling problem of minimizing total flow time for $n$ jobs on $m$ identical machines. A classical $\Omega(n)$ lower bound shows that no deterministic single-machine algorithm can beat the trivial greedy, even when $n$ is known in advance. However, this barrier is specific to deterministic algorithms on a single machine, leaving open what randomization, multiple machines, or the kill-and-restart capability can achieve. We give a nearly complete answer. For randomized non-preemptive algorithms, we establish a tight $\Theta(\sqrt{n/m})$ competitive ratio, which also improves the best offline approximation to $O(\sqrt{n/m})$. For deterministic non-preemptive algorithms on multiple machines, we prove an $O(n/m^

On the average-case complexity landscape for Tensor-Isomorphism-complete problems over finite fields
arXiv:2604.00591v1 Announce Type: cross Abstract: In Grochow and Qiao (SIAM J. Comput., 2021), the complexity class Tensor Isomorphism (TI) was introduced and isomorphism problems for groups, algebras, and polynomials were shown to be TI-complete. In this paper, we study average-case algorithms for several TI-complete problems over finite fields, including algebra isomorphism, matrix code conjugacy, and $4$-tensor isomorphism. Our main results are as follows. Over the finite field of order $q$, we devise (1) average-case polynomial-time algorithms for algebra isomorphism and matrix code conjugacy that succeed in a $1/\Theta(q)$ fraction of inputs and (2) an average-case polynomial-time algorithm for the $4$-tensor isomorphism that succeeds in a $1/q^{\Theta(1)}$ fraction of inputs. Prior t


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!