Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills
Trace2Skill enables scalable skill generation for LLM agents by analyzing diverse execution traces in parallel and consolidating them into transferable, declarative skills without parameter updates or external modules. (4 upvotes on HuggingFace)
Published on Mar 26
Authors:
,
,
,
,
,
Abstract
Trace2Skill enables scalable skill generation for LLM agents by analyzing diverse execution traces in parallel and consolidating them into transferable, declarative skills without parameter updates or external modules.
AI-generated summary
Equipping Large Language Model (LLM) agents with domain-specific skills is critical for tackling complex tasks. Yet, manual authoring creates a severe scalability bottleneck. Conversely, automated skill generation often yields fragile or fragmented results because it either relies on shallow parametric knowledge or sequentially overfits to non-generalizable trajectory-local lessons. To overcome this, we introduce Trace2Skill, a framework that mirrors how human experts author skills: by holistically analyzing broad execution experience before distilling it into a single, comprehensive guide. Instead of reacting sequentially to individual trajectories, Trace2Skill dispatches a parallel fleet of sub-agents to analyze a diverse pool of executions. It extracts trajectory-specific lessons and hierarchically consolidates them into a unified, conflict-free skill directory via inductive reasoning. Trace2Skill supports both deepening existing human-written skills and creating new ones from scratch. Experiments in challenging domains, such as spreadsheet, VisionQA and math reasoning, show that Trace2Skill significantly improves upon strong baselines, including Anthropic's official xlsx skills. Crucially, this trajectory-grounded evolution does not merely memorize task instances or model-specific quirks: evolved skills transfer across LLM scales and generalize to OOD settings. For example, skills evolved by Qwen3.5-35B on its own trajectories improved a Qwen3.5-122B agent by up to 57.65 absolute percentage points on WikiTableQuestions. Ultimately, our results demonstrate that complex agent experience can be packaged into highly transferable, declarative skills -- requiring no parameter updates, no external retrieval modules, and utilizing open-source models as small as 35B parameters.
View arXiv page View PDF Add to collection
Get this paper in your agent:
hf papers read 2603.25158
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash
Models citing this paper 0
No model linking this paper
Cite arxiv.org/abs/2603.25158 in a model README.md to link it from this page.
Datasets citing this paper 0
No dataset linking this paper
Cite arxiv.org/abs/2603.25158 in a dataset README.md to link it from this page.
Spaces citing this paper 0
No Space linking this paper
Cite arxiv.org/abs/2603.25158 in a Space README.md to link it from this page.
Collections including this paper 7
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
Is Turboquant really a game changer?
I am currently utilizing qwen3.5 and Gemma 4 model. Realized Gemma 4 requires 2x ram for same context length. As far as I understand, what turbo quant gives is quantizing kv cache into about 4 bit and minimize the loses But Q8 still not lose the context that much so isn't kv cache ram for qwen 3.5 q8 and Gemma 4 truboquant is the same? Is turboquant also applicable in qwen's cache architecture? because as far as I know they didn't tested it in qwen3.5 style kv cache in their paper. Just curious, I started to learn local LLM recently submitted by /u/Interesting-Print366 [link] [comments]

Found how to toggle reasoning mode for Gemma in LM-Studio!
I’ve figured out how to trigger the reasoning process by adding "/think" to the system prompt. Heads up: the thought tags have an unusual pipe ( | ) placement, which is why many LLM fail to parse the reasoning section correctly. So Start String is : " thought" And End String is " " Here is the Jinja template: https://pastebin.com/MGmD8UiC Tested and working with the 26B and 31B versions. submitted by /u/Adventurous-Paper566 [link] [comments]

AI companions can comfort lonely users but may deepen distress over time
AI companions are always available, never judge, never tire and never demand anything in return. If someone is struggling with loneliness, this frictionlessness can seem profoundly appealing. However, new research shows that in the long term, seeking emotional support from an AI companion can pull users away from important human relationships.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

New Rowhammer attack can grant kernel-level control on Nvidia workstation GPUs
A study from researchers at UNC Chapel Hill and Georgia Tech shows that GDDR6-based Rowhammer attacks can grant kernel-level access to Linux systems equipped with GPUs based on Nvidia's Ampere and Ada Lovelace architectures. The vulnerability appears significantly more severe than what was outlined in a paper last year. Read Entire Article
![[D] ICML Reviewer Acknowledgement](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-matrix-rain-CvjLrWJiXfamUnvj5xT9J9.webp)
[D] ICML Reviewer Acknowledgement
Hi, I'm a little confused about ICML discussion period Does the period for reviewer acknowledging responses have already ended? One of the four reviewers did not present any answer to a paper of mine. Do you know if the reviewer can still change their score before April 7th? There is a reviewer comment that I will answer on Monday. Will the reviewer be able to update the score after seeing my answer? Thanks! submitted by /u/Massive_Horror9038 [link] [comments]

Considerations for growing the pie
Recently some friends and I were comparing growing the pie interventions to an increasing our friends' share of the pie intervention, and at first we mostly missed some general considerations against the latter type. 1. Decision-theoretic considerations The world is full of people with different values working towards their own ends; each of them can choose to use their resources to increase the total size of the pie or to increase their share of the pie. All of them would significantly prefer a world in which resources were used to increase the size of the pie, and this leads to a number [of] compelling justifications for each individual to cooperate. . . . by increasing the size of the pie we create a world which is better for people on average, and from behind the veil of ignorance we s


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!