Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessClaude Code Hooks: How to Auto-Format, Lint, and Test on Every SaveDev.to AIFunctional Emotions in Large Language Models: What Anthropic Found Inside ClaudeMedium AIWhy Nobody Is Testing AI Agent Security at Scale — And How Swarm Simulation Could Change ThatDev.to AIThe 10 Claude “Plugins” You Actually Need in 2026Medium AIHow AI Is Changing the Way We Build Online BusinessesDev.to AIAGI Won’t Automate Most Jobs—Economist Reveals Why They’re Not Worth ItDev.to AIThe AI Agent's Guide to Building a Writing PortfolioDev.to AIMy Claude Code Buddy Moved Into My MacBook's Notch and I Can't Stop Looking at ItDEV CommunityChoosing an AI Agent Orchestrator in 2026: A Practical ComparisonDev.to AII Turned My MacBook's Notch Into a Control Center for AI Coding AgentsDEV Communitytrunk/98fc38c4eb17c435699cea1a7d3aa84c14458ed9: Add autograd_cache_key to aot_autograd with tests (#178152)PyTorch ReleasesBuildWithAI: What Broke, What I Learned, What's NextDEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessClaude Code Hooks: How to Auto-Format, Lint, and Test on Every SaveDev.to AIFunctional Emotions in Large Language Models: What Anthropic Found Inside ClaudeMedium AIWhy Nobody Is Testing AI Agent Security at Scale — And How Swarm Simulation Could Change ThatDev.to AIThe 10 Claude “Plugins” You Actually Need in 2026Medium AIHow AI Is Changing the Way We Build Online BusinessesDev.to AIAGI Won’t Automate Most Jobs—Economist Reveals Why They’re Not Worth ItDev.to AIThe AI Agent's Guide to Building a Writing PortfolioDev.to AIMy Claude Code Buddy Moved Into My MacBook's Notch and I Can't Stop Looking at ItDEV CommunityChoosing an AI Agent Orchestrator in 2026: A Practical ComparisonDev.to AII Turned My MacBook's Notch Into a Control Center for AI Coding AgentsDEV Communitytrunk/98fc38c4eb17c435699cea1a7d3aa84c14458ed9: Add autograd_cache_key to aot_autograd with tests (#178152)PyTorch ReleasesBuildWithAI: What Broke, What I Learned, What's NextDEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

DGPO: RL-Steered Graph Diffusion for Neural Architecture Generation

arXiv cs.NEby Aleksei Liuliakov, Luca Hermes, Barbara HammerApril 1, 20261 min read0 views
Source Quiz

arXiv:2602.19261v2 Announce Type: replace-cross Abstract: Reinforcement learning fine-tuning has proven effective for steering generative diffusion models toward desired properties in image and molecular domains. Graph diffusion models have similarly been applied to combinatorial structure generation, including neural architecture search (NAS). However, neural architectures are directed acyclic graphs (DAGs) where edge direction encodes functional semantics such as data flow-information that existing graph diffusion methods, designed for undirected structures, discard. We propose Directed Graph Policy Optimization (DGPO), which extends reinforcement learning fine-tuning of discrete graph diffusion models to DAGs via topological node ordering and positional encoding. Validated on NAS-Bench-

View PDF HTML (experimental)

Abstract:Reinforcement learning fine-tuning has proven effective for steering generative diffusion models toward desired properties in image and molecular domains. Graph diffusion models have similarly been applied to combinatorial structure generation, including neural architecture search (NAS). However, neural architectures are directed acyclic graphs (DAGs) where edge direction encodes functional semantics such as data flow-information that existing graph diffusion methods, designed for undirected structures, discard. We propose Directed Graph Policy Optimization (DGPO), which extends reinforcement learning fine-tuning of discrete graph diffusion models to DAGs via topological node ordering and positional encoding. Validated on NAS-Bench-101 and NAS-Bench-201, DGPO matches the benchmark optimum on all three NAS-Bench-201 tasks (91.61%, 73.49%, 46.77%). The central finding is that the model learns transferable structural priors: pretrained on only 7% of the search space, it generates near-oracle architectures after fine-tuning, within 0.32 percentage points of the full-data model and extrapolating 7.3 percentage points beyond its training ceiling. Bidirectional control experiments confirm genuine reward-driven steering, with inverse optimization reaching near random-chance accuracy (9.5%). These results demonstrate that reinforcement learning-steered discrete diffusion, once extended to handle directionality, provides a controllable generative framework for directed combinatorial structures.

Comments: Submitted to IJCNN 2026 (IEEE WCCI). 7 pages, 4 figures

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Cite as: arXiv:2602.19261 [cs.LG]

(or arXiv:2602.19261v2 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2602.19261

arXiv-issued DOI via DataCite

Submission history

From: Aleksei Liuliakov [view email] [v1] Sun, 22 Feb 2026 16:23:42 UTC (206 KB) [v2] Mon, 30 Mar 2026 18:45:29 UTC (207 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelbenchmarktraining

Knowledge Map

Knowledge Map
TopicsEntitiesSource
DGPO: RL-St…modelbenchmarktrainingannouncepolicyarxivarXiv cs.NE

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 235 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!