Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessAnthropic is learning that there are no take-backs on the internetBusiness InsiderOpenClaw launches an official China mirror, with ByteDance providing the servers to host the Chinese-language service, as OpenClaw explodes in the country (Juro Osawa/The Information)TechmemeOpenAI doesn’t just want to answer your questions — it wants to run your digital life - TechRadarGoogle News: OpenAIIs AI the new “Manhattan Project”? Vox went to Los Alamos to find out. - VoxGoogle News: ChatGPTWhy Nvidia just poured $2 billion into AI ASIC competitor Marvell — NVLink Fusion turns into soft ecosystem lock-intomshardware.comBest Video Conferencing Solution for Enterprises in 2026Dev.to AIFunctional Testing vs Reality: What Actually Breaks in ProductionDev.to AIGenerative AI In Manufacturing Market to hit USD 10,540.1 Million by 2033 - vocal.mediaGoogle News: Generative AISources: Chinese optics company and Nvidia supplier Innolight confidentially filed for a Hong Kong IPO that could raise $3B+; Innolight is listed in Shenzhen (Bloomberg)TechmemeData Observability 2.0: The Backbone of Trusted Enterprise AnalyticsDev.to AIDid you know your GIGABYTE laptop has a built-in AI coding assistant? Meet GiMATE Coder 🤖Dev.to AII Built a Local-First AI Knowledge Base for Developers — Here's What Makes It DifferentDev.to AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessAnthropic is learning that there are no take-backs on the internetBusiness InsiderOpenClaw launches an official China mirror, with ByteDance providing the servers to host the Chinese-language service, as OpenClaw explodes in the country (Juro Osawa/The Information)TechmemeOpenAI doesn’t just want to answer your questions — it wants to run your digital life - TechRadarGoogle News: OpenAIIs AI the new “Manhattan Project”? Vox went to Los Alamos to find out. - VoxGoogle News: ChatGPTWhy Nvidia just poured $2 billion into AI ASIC competitor Marvell — NVLink Fusion turns into soft ecosystem lock-intomshardware.comBest Video Conferencing Solution for Enterprises in 2026Dev.to AIFunctional Testing vs Reality: What Actually Breaks in ProductionDev.to AIGenerative AI In Manufacturing Market to hit USD 10,540.1 Million by 2033 - vocal.mediaGoogle News: Generative AISources: Chinese optics company and Nvidia supplier Innolight confidentially filed for a Hong Kong IPO that could raise $3B+; Innolight is listed in Shenzhen (Bloomberg)TechmemeData Observability 2.0: The Backbone of Trusted Enterprise AnalyticsDev.to AIDid you know your GIGABYTE laptop has a built-in AI coding assistant? Meet GiMATE Coder 🤖Dev.to AII Built a Local-First AI Knowledge Base for Developers — Here's What Makes It DifferentDev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

Dynamic Dual-Granularity Skill Bank for Agentic RL

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.28716v1 Announce Type: new Abstract: Agentic reinforcement learning (RL) can benefit substantially from reusable experience, yet existing skill-based methods mainly extract trajectory-level guidance and often lack principled mechanisms for maintaining an evolving skill memory. We propose D2Skill, a dynamic dual-granularity skill bank for agentic RL that organizes reusable experience into task skills for high-level guidance and step skills for fine-grained decision support and error correction. D2Skill jointly trains the policy and skill bank through paired baseline and skill-injecte — Songjun Tu, Chengdong Xu, Qichao Zhang, Yaocheng Zhang, Xiangyuan Lan, Linjing Li, Dongbin Zhao

View PDF HTML (experimental)

Abstract:Agentic reinforcement learning (RL) can benefit substantially from reusable experience, yet existing skill-based methods mainly extract trajectory-level guidance and often lack principled mechanisms for maintaining an evolving skill memory. We propose D2Skill, a dynamic dual-granularity skill bank for agentic RL that organizes reusable experience into task skills for high-level guidance and step skills for fine-grained decision support and error correction. D2Skill jointly trains the policy and skill bank through paired baseline and skill-injected rollouts under the same policy, using their performance gap to derive hindsight utility signals for both skill updating and policy optimization. Built entirely from training-time experience, the skill bank is continuously expanded through reflection and maintained with utility-aware retrieval and pruning. Experiments on ALFWorld and WebShop with Qwen2.5-7B-Instruct and Qwen3-4B-Instruct-2507 show that D2Skill consistently improves success rates over skill-free baselines by 10-20 points. Further ablations and analyses show that both dual-granularity skill modeling and dynamic skill maintenance are critical to these gains, while the learned skills exhibit higher utility, transfer across evaluation settings, and introduce only modest training overhead.

Comments: 12 pages

Subjects:

Artificial Intelligence (cs.AI)

MSC classes: 68T05

ACM classes: I.2.6; I.2.11

Cite as: arXiv:2603.28716 [cs.AI]

(or arXiv:2603.28716v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2603.28716

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Songjun Tu [view email] [v1] Mon, 30 Mar 2026 17:32:11 UTC (857 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Dynamic Dua…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 136 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers