DGPO: RL-Steered Graph Diffusion for Neural Architecture Generation
arXiv:2602.19261v2 Announce Type: replace-cross Abstract: Reinforcement learning fine-tuning has proven effective for steering generative diffusion models toward desired properties in image and molecular domains. Graph diffusion models have similarly been applied to combinatorial structure generation, including neural architecture search (NAS). However, neural architectures are directed acyclic graphs (DAGs) where edge direction encodes functional semantics such as data flow-information that existing graph diffusion methods, designed for undirected structures, discard. We propose Directed Graph Policy Optimization (DGPO), which extends reinforcement learning fine-tuning of discrete graph diffusion models to DAGs via topological node ordering and positional encoding. Validated on NAS-Bench-
View PDF HTML (experimental)
Abstract:Reinforcement learning fine-tuning has proven effective for steering generative diffusion models toward desired properties in image and molecular domains. Graph diffusion models have similarly been applied to combinatorial structure generation, including neural architecture search (NAS). However, neural architectures are directed acyclic graphs (DAGs) where edge direction encodes functional semantics such as data flow-information that existing graph diffusion methods, designed for undirected structures, discard. We propose Directed Graph Policy Optimization (DGPO), which extends reinforcement learning fine-tuning of discrete graph diffusion models to DAGs via topological node ordering and positional encoding. Validated on NAS-Bench-101 and NAS-Bench-201, DGPO matches the benchmark optimum on all three NAS-Bench-201 tasks (91.61%, 73.49%, 46.77%). The central finding is that the model learns transferable structural priors: pretrained on only 7% of the search space, it generates near-oracle architectures after fine-tuning, within 0.32 percentage points of the full-data model and extrapolating 7.3 percentage points beyond its training ceiling. Bidirectional control experiments confirm genuine reward-driven steering, with inverse optimization reaching near random-chance accuracy (9.5%). These results demonstrate that reinforcement learning-steered discrete diffusion, once extended to handle directionality, provides a controllable generative framework for directed combinatorial structures.
Comments: Submitted to IJCNN 2026 (IEEE WCCI). 7 pages, 4 figures
Subjects:
Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Cite as: arXiv:2602.19261 [cs.LG]
(or arXiv:2602.19261v2 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2602.19261
arXiv-issued DOI via DataCite
Submission history
From: Aleksei Liuliakov [view email] [v1] Sun, 22 Feb 2026 16:23:42 UTC (206 KB) [v2] Mon, 30 Mar 2026 18:45:29 UTC (207 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelbenchmarktraining
Choosing an AI Agent Orchestrator in 2026: A Practical Comparison
Running one AI coding agent is easy. Running three in parallel on the same codebase is where things get interesting — and where you need to make a tooling choice. There's no "best" orchestrator. There's the right one for your workflow. Here's an honest comparison of five approaches, with the tradeoffs I've seen after months of running multi-agent setups. The Options 1. Raw tmux Scripts What it is: Shell scripts that launch agents in tmux panes. DIY orchestration. Pros: Zero dependencies beyond tmux Full control over every detail No abstractions to fight You already know how it works Cons: No state management — you track everything manually No message routing between agents No test gating — agents declare "done" without verification Breaks when agents crash or hit context limits You become

How AI Is Changing the Way We Build Online Businesses
Not long ago, building an online business meant: months of development hiring developers large upfront costs Today? AI has completely changed the game. Now, one person can go from idea → to revenue faster than ever before. And this shift is just getting started. ⚠️ The Old Way vs The New Way Before AI: Build everything from scratch Spend weeks on infrastructure Launch slowly Iterate even slower With AI: Build faster Automate key tasks Launch quickly Iterate in real time The difference is massive. 🧠 AI Is Reducing the Cost of Building One of the biggest changes: 👉 Building is no longer the bottleneck AI helps with: generating content writing code automating workflows handling repetitive tasks What used to take weeks… 👉 now takes days ⚙️ Infrastructure Is No Longer the Hard Part Another s

Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!