Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessFirst-Time Payees, Payouts, and Why Clean Transactions Still Turn Into Fraud LossesDEV CommunityHandling Extreme Class Imbalance in Fraud DetectionDEV CommunityAntropic's Claude Code leaked and Axios NPM InflitrationDEV CommunityReal-Time Fraud Scoring Latency: What 47ms Actually MeansDEV CommunityPause, Save, Resume: The Definitive Guide to StashingDEV CommunitySouth Korean trade data: chip shipments hit a record-high value of $32.83B in March 2026, up 151.4% YoY, pushing total exports to a record $86.13B, up 48.3% YoY (Steven Borowiec/Nikkei Asia)Techmeme5 Rust patterns that replaced my Python scriptsDEV CommunityI automated my entire dev workflow with Claude Code hooksDEV CommunityHugging Face Releases TRL v1.0: A Unified Post-Training Stack for SFT, Reward Modeling, DPO, and GRPO WorkflowsMarkTechPostQ2, Day 1: When Concepts Have to Become CodeDEV CommunityProgress adds AI search & personalisation to Sitefinity - IT Brief AsiaGoogle News: Generative AIInteractive Data Chart Generator (Pure JavaScript Canvas Tool)Hackernoon AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessFirst-Time Payees, Payouts, and Why Clean Transactions Still Turn Into Fraud LossesDEV CommunityHandling Extreme Class Imbalance in Fraud DetectionDEV CommunityAntropic's Claude Code leaked and Axios NPM InflitrationDEV CommunityReal-Time Fraud Scoring Latency: What 47ms Actually MeansDEV CommunityPause, Save, Resume: The Definitive Guide to StashingDEV CommunitySouth Korean trade data: chip shipments hit a record-high value of $32.83B in March 2026, up 151.4% YoY, pushing total exports to a record $86.13B, up 48.3% YoY (Steven Borowiec/Nikkei Asia)Techmeme5 Rust patterns that replaced my Python scriptsDEV CommunityI automated my entire dev workflow with Claude Code hooksDEV CommunityHugging Face Releases TRL v1.0: A Unified Post-Training Stack for SFT, Reward Modeling, DPO, and GRPO WorkflowsMarkTechPostQ2, Day 1: When Concepts Have to Become CodeDEV CommunityProgress adds AI search & personalisation to Sitefinity - IT Brief AsiaGoogle News: Generative AIInteractive Data Chart Generator (Pure JavaScript Canvas Tool)Hackernoon AI

CutClaw: Agentic Hours-Long Video Editing via Music Synchronization

HuggingFace PapersMarch 31, 20268 min read0 views
Source Quiz

CutClaw is an autonomous multi-agent framework that uses multimodal language models to automatically edit long video footage into rhythmic, narratively consistent short videos with synchronized audio and visual elements. (4 upvotes on HuggingFace)

Published on Mar 31

Authors:

,

,

,

Abstract

CutClaw is an autonomous multi-agent framework that uses multimodal language models to automatically edit long video footage into rhythmic, narratively consistent short videos with synchronized audio and visual elements.

AI-generated summary

Editing the video content with audio alignment forms a digital human-made art in current social media. However, the time-consuming and repetitive nature of manual video editing has long been a challenge for filmmakers and professional content creators alike. In this paper, we introduce CutClaw, an autonomous multi-agent framework designed to edit hours-long raw footage into meaningful short videos that leverages the capabilities of multiple Multimodal Language Models~(MLLMs) as an agent system. It produces videos with synchronized music, followed by instructions, and a visually appealing appearance. In detail, our approach begins by employing a hierarchical multimodal decomposition that captures both fine-grained details and global structures across visual and audio footage. Then, to ensure narrative consistency, a Playwriter Agent orchestrates the whole storytelling flow and structures the long-term narrative, anchoring visual scenes to musical shifts. Finally, to construct a short edited video, Editor and Reviewer Agents collaboratively optimize the final cut via selecting fine-grained visual content based on rigorous aesthetic and semantic criteria. We conduct detailed experiments to demonstrate that CutClaw significantly outperforms state-of-the-art baselines in generating high-quality, rhythm-aligned videos. The code is available at: https://github.com/GVCLab/CutClaw.

View arXiv page View PDF Project page GitHub 10 Add to collection

Get this paper in your agent:

hf papers read 2603.29664

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.29664 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.29664 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.29664 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
CutClaw: Ag…researchpaperarxivmultimodal …multi-agent…video editi…HuggingFace…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 263 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers