SLVMEval: Synthetic Meta Evaluation Benchmark for Text-to-Long Video Generation
arXiv:2603.29186v1 Announce Type: new Abstract: This paper proposes the synthetic long-video meta-evaluation (SLVMEval), a benchmark for meta-evaluating text-to-video (T2V) evaluation systems. The proposed SLVMEval benchmark focuses on assessing these systems on videos of up to 10,486 s (approximately 3 h). The benchmark targets a fundamental requirement, namely, whether the systems can accurately assess video quality in settings that are easy for humans to assess. We adopt a pairwise comparison-based meta-evaluation framework. Building on dense video-captioning datasets, we synthetically degrade source videos to create controlled "high-quality versus low-quality" pairs across 10 distinct aspects. Then, we employ crowdsourcing to filter and retain only those pairs in which the degradation
View PDF HTML (experimental)
Abstract:This paper proposes the synthetic long-video meta-evaluation (SLVMEval), a benchmark for meta-evaluating text-to-video (T2V) evaluation systems. The proposed SLVMEval benchmark focuses on assessing these systems on videos of up to 10,486 s (approximately 3 h). The benchmark targets a fundamental requirement, namely, whether the systems can accurately assess video quality in settings that are easy for humans to assess. We adopt a pairwise comparison-based meta-evaluation framework. Building on dense video-captioning datasets, we synthetically degrade source videos to create controlled "high-quality versus low-quality" pairs across 10 distinct aspects. Then, we employ crowdsourcing to filter and retain only those pairs in which the degradation is clearly perceptible, thereby establishing an effective final testbed. Using this testbed, we assess the reliability of existing evaluation systems in ranking these pairs. Experimental results demonstrate that human evaluators can identify the better long video with 84.7%-96.8% accuracy, and in nine of the 10 aspects, the accuracy of these systems falls short of human assessment, revealing weaknesses in text-to-long-video evaluation.
Comments: Accepted to CVPR 2026
Subjects:
Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as: arXiv:2603.29186 [cs.CV]
(or arXiv:2603.29186v1 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2603.29186
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Ryosuke Matsuda [view email] [v1] Tue, 31 Mar 2026 02:51:30 UTC (3,593 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
benchmarkannouncevaluation
AMD's Lemonade Just Made Every Nvidia-Only AI Guide Obsolete
Search for "how to run LLMs locally" and count the Nvidia logos. CUDA this, CUDA that. If you own AMD hardware — and statistically, a lot of you do — the local AI ecosystem has treated you like a second-class citizen for years. That just changed. Lemonade is an open-source, AMD-backed local AI server that handles LLM chat, image generation, speech synthesis, and transcription — all from a single install, all running on your hardware, all private. It hit 216 points on Hacker News this week, and the discussion thread tells you everything about why AMD users are paying attention. 🍋 What Lemonade actually is: A 2MB native C++ service that auto-configures for your AMD GPU, NPU, or CPU. It exposes an OpenAI-compatible API at localhost:13305 , meaning any app that talks to OpenAI (VS Code Copilo

The Algorithmic Edge: Launching Your Day Trading Journey with AI Sentiment and Next-Gen Charting
The Modern Trader's Toolkit: From Automated Signals to Market Sentiment AI The landscape of retail trading has undergone a seismic shift in the last five years. Where once a Bloomberg Terminal, a broker's phone line, and gut instinct were the primary tools, today's trader navigates a digital ecosystem powered by artificial intelligence, real-time analytics, and democratized data. For aspiring and established traders alike, the challenge is no longer accessing information, but intelligently filtering the signal from the noise. This evolution has given rise to sophisticated AI trading signals , comprehensive educational resources like a day trading guide for beginners , and powerful analytics platforms that go beyond traditional charting. Understanding these tools—and how they integrate—is n
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Quantum computers might crack today's encryption far sooner than we thought
According to a study by engineers at Caltech and the UC Department of Physics, quantum computers do not need to be nearly as powerful as previously believed to crack the most advanced cryptographic technologies. The research claims that Shor's algorithm could break RSA public-key encryption using quantum computers with just... Read Entire Article




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!