Research Papers benchmark announce valuation paper arxiv

SLVMEval: Synthetic Meta Evaluation Benchmark for Text-to-Long Video Generation

arXiv cs.CVby Ryosuke Matsuda, Keito Kudo, Haruto Yoshida, Nobuyuki Shimizu, Jun SuzukiApril 1, 20261 min read0 views

arXiv:2603.29186v1 Announce Type: new Abstract: This paper proposes the synthetic long-video meta-evaluation (SLVMEval), a benchmark for meta-evaluating text-to-video (T2V) evaluation systems. The proposed SLVMEval benchmark focuses on assessing these systems on videos of up to 10,486 s (approximately 3 h). The benchmark targets a fundamental requirement, namely, whether the systems can accurately assess video quality in settings that are easy for humans to assess. We adopt a pairwise comparison-based meta-evaluation framework. Building on dense video-captioning datasets, we synthetically degrade source videos to create controlled "high-quality versus low-quality" pairs across 10 distinct aspects. Then, we employ crowdsourcing to filter and retain only those pairs in which the degradation

View PDF HTML (experimental)

Abstract:This paper proposes the synthetic long-video meta-evaluation (SLVMEval), a benchmark for meta-evaluating text-to-video (T2V) evaluation systems. The proposed SLVMEval benchmark focuses on assessing these systems on videos of up to 10,486 s (approximately 3 h). The benchmark targets a fundamental requirement, namely, whether the systems can accurately assess video quality in settings that are easy for humans to assess. We adopt a pairwise comparison-based meta-evaluation framework. Building on dense video-captioning datasets, we synthetically degrade source videos to create controlled "high-quality versus low-quality" pairs across 10 distinct aspects. Then, we employ crowdsourcing to filter and retain only those pairs in which the degradation is clearly perceptible, thereby establishing an effective final testbed. Using this testbed, we assess the reliability of existing evaluation systems in ranking these pairs. Experimental results demonstrate that human evaluators can identify the better long video with 84.7%-96.8% accuracy, and in nine of the 10 aspects, the accuracy of these systems falls short of human assessment, revealing weaknesses in text-to-long-video evaluation.

Comments: Accepted to CVPR 2026

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.29186 [cs.CV]

(or arXiv:2603.29186v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.29186

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Ryosuke Matsuda [view email] [v1] Tue, 31 Mar 2026 02:51:30 UTC (3,593 KB)

Original source

arXiv cs.CV

https://arxiv.org/abs/2603.29186

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

benchmarkannouncevaluation

Models

Workshop for Designing Benchmarks for Human Flourishing with AI - MIT Media Lab

Workshop for Designing Benchmarks for Human Flourishing with AI MIT Media Lab

GNews AI benchmark

1m6 months ago

Open Source AILive

AMD's Lemonade Just Made Every Nvidia-Only AI Guide Obsolete

Search for "how to run LLMs locally" and count the Nvidia logos. CUDA this, CUDA that. If you own AMD hardware — and statistically, a lot of you do — the local AI ecosystem has treated you like a second-class citizen for years. That just changed. Lemonade is an open-source, AMD-backed local AI server that handles LLM chat, image generation, speech synthesis, and transcription — all from a single install, all running on your hardware, all private. It hit 216 points on Hacker News this week, and the discussion thread tells you everything about why AMD users are paying attention. 🍋 What Lemonade actually is: A 2MB native C++ service that auto-configures for your AMD GPU, NPU, or CPU. It exposes an OpenAI-compatible API at localhost:13305 , meaning any app that talks to OpenAI (VS Code Copilo

Dev.to AI

7m17 minutes ago

ProductsLive

The Algorithmic Edge: Launching Your Day Trading Journey with AI Sentiment and Next-Gen Charting

The Modern Trader's Toolkit: From Automated Signals to Market Sentiment AI The landscape of retail trading has undergone a seismic shift in the last five years. Where once a Bloomberg Terminal, a broker's phone line, and gut instinct were the primary tools, today's trader navigates a digital ecosystem powered by artificial intelligence, real-time analytics, and democratized data. For aspiring and established traders alike, the challenge is no longer accessing information, but intelligently filtering the signal from the noise. This evolution has given rise to sophisticated AI trading signals , comprehensive educational resources like a day trading guide for beginners , and powerful analytics platforms that go beyond traditional charting. Understanding these tools—and how they integrate—is n

Dev.to AI

6m14 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 159 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research Papers

I was a beta tester for the Nobel prize-winning AlphaFold AI – it’s going to revolutionise health research - The Conversation

I was a beta tester for the Nobel prize-winning AlphaFold AI – it’s going to revolutionise health research The Conversation

GNews AI protein

1mover 1 year ago

Research PapersRecent

IBM Advances Quantum Computing Research: Will it Boost Prospects? - Yahoo Finance Singapore

IBM Advances Quantum Computing Research: Will it Boost Prospects? Yahoo Finance Singapore

GNews AI quantum

1m1 day ago

Research PapersFresh

Quantum computers might crack today's encryption far sooner than we thought

According to a study by engineers at Caltech and the UC Department of Physics, quantum computers do not need to be nearly as powerful as previously believed to crack the most advanced cryptographic technologies. The research claims that Shor's algorithm could break RSA public-key encryption using quantum computers with just... Read Entire Article

TechSpot

1mabout 4 hours ago

Research Papers

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI - WSJ

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI WSJ

GNews AI manufacturing

1m29 days ago