Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessIran Threatens to Attack Apple, Google, and Other US Tech Firms in Middle EastTechRepublic AI‘It’s all very possible’: Michael Patrick King on The Comeback return’s shocking AI twist – and why And Just Like That will age wellThe Guardian AIMastering the art of no in generative AI projects - FinTech GlobalGoogle News: Generative AISources: SpaceX has filed confidentially for an IPO, putting it on track for a June listing; it could reportedly seek a valuation of $1.75T+ and raise ~$75B (Bloomberg)TechmemeBrain implants let paralyzed man make music with his thoughtsTechSpotAI Guardrails by Zapier Gives Teams Inline Safety Checks for Every AI-Powered Workflow - citybizGoogle News: AI SafetySource: AWS' operation in Bahrain was damaged after an Iranian strike; Bahrain earlier said the civil defence force was "extinguishing a fire in a facility" (Financial Times)TechmemeAnthropic Accidentally Leaks Claude Source Code - BenzingaGoogle News: ClaudeThe Fact That Anthropic Has Been Boasting About How Much Its Development Now Relies on Claude Makes It Very Interesting That It Just Suffered a Catastrophic Leak of Its Source Code - FuturismGoogle News: ClaudeAOC Reportedly Says She Will Vote Against All Military Aid To Israel, Including Defensive WeaponsInternational Business TimesTop Artificial Intelligence Speakers for Events | Scott Steinberg - futuristsspeakers.comGoogle News: AIThese Raspberry Pi price hikes are no jokeThe Verge AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessIran Threatens to Attack Apple, Google, and Other US Tech Firms in Middle EastTechRepublic AI‘It’s all very possible’: Michael Patrick King on The Comeback return’s shocking AI twist – and why And Just Like That will age wellThe Guardian AIMastering the art of no in generative AI projects - FinTech GlobalGoogle News: Generative AISources: SpaceX has filed confidentially for an IPO, putting it on track for a June listing; it could reportedly seek a valuation of $1.75T+ and raise ~$75B (Bloomberg)TechmemeBrain implants let paralyzed man make music with his thoughtsTechSpotAI Guardrails by Zapier Gives Teams Inline Safety Checks for Every AI-Powered Workflow - citybizGoogle News: AI SafetySource: AWS' operation in Bahrain was damaged after an Iranian strike; Bahrain earlier said the civil defence force was "extinguishing a fire in a facility" (Financial Times)TechmemeAnthropic Accidentally Leaks Claude Source Code - BenzingaGoogle News: ClaudeThe Fact That Anthropic Has Been Boasting About How Much Its Development Now Relies on Claude Makes It Very Interesting That It Just Suffered a Catastrophic Leak of Its Source Code - FuturismGoogle News: ClaudeAOC Reportedly Says She Will Vote Against All Military Aid To Israel, Including Defensive WeaponsInternational Business TimesTop Artificial Intelligence Speakers for Events | Scott Steinberg - futuristsspeakers.comGoogle News: AIThese Raspberry Pi price hikes are no jokeThe Verge AI

GEditBench v2: A Human-Aligned Benchmark for General Image Editing

HuggingFace PapersMarch 30, 20268 min read0 views
Source Quiz

A new benchmark and evaluation model for image editing are introduced to better assess visual consistency and human alignment in complex editing tasks. (12 upvotes on HuggingFace)

Published on Mar 30

Authors:

,

,

,

,

,

,

Abstract

A new benchmark and evaluation model for image editing are introduced to better assess visual consistency and human alignment in complex editing tasks.

AI-generated summary

Recent advances in image editing have enabled models to handle complex instructions with impressive realism. However, existing evaluation frameworks lag behind: current benchmarks suffer from narrow task coverage, while standard metrics fail to adequately capture visual consistency, i.e., the preservation of identity, structure and semantic coherence between edited and original images. To address these limitations, we introduce GEditBench v2, a comprehensive benchmark with 1,200 real-world user queries spanning 23 tasks, including a dedicated open-set category for unconstrained, out-of-distribution editing instructions beyond predefined tasks. Furthermore, we propose PVC-Judge, an open-source pairwise assessment model for visual consistency, trained via two novel region-decoupled preference data synthesis pipelines. Besides, we construct VCReward-Bench using expert-annotated preference pairs to assess the alignment of PVC-Judge with human judgments on visual consistency evaluation. Experiments show that our PVC-Judge achieves state-of-the-art evaluation performance among open-source models and even surpasses GPT-5.1 on average. Finally, by benchmarking 16 frontier editing models, we show that GEditBench v2 enables more human-aligned evaluation, revealing critical limitations of current models, and providing a reliable foundation for advancing precise image editing.

View arXiv page View PDF Project page GitHub 30 Add to collection

Get this paper in your agent:

hf papers read 2603.28547

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 3

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.28547 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
GEditBench …researchpaperarxivGEditBench …PVC-Judgevisual cons…HuggingFace…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 136 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers