Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessConnecting MCP servers to Amazon Bedrock AgentCore Gateway using Authorization Code flowAWS Machine Learning BlogParsing the AI and gaming future with Nvidia’s Jensen Huang | GTC Q&A - GamesBeatGNews AI NVIDIAStartup Battlefield 200 applications open: A chance for VC access, TechCrunch coverage, and $100KTechCrunch Venture🔥 Jeffallan/claude-skillsGitHub Trending🔥 teng-lin/notebooklm-pyGitHub Trending🔥 HKUDS/DeepTutorGitHub TrendingNebius Stock Rises on $12B Meta AI Deal & Nvidia Investment | 2026 - News and Statistics - indexbox.ioGNews AI NVIDIAHow to use the new ChatGPT app integrations, including DoorDash, Spotify, Uber, and othersTechCrunch AINvidia’s AI Boom Faces Taiwan Supply Risk - NVIDIA (NASDAQ:NVDA), Taiwan Semiconductor (NYSE:TSM) - BenzingaGNews AI NVIDIAFrom Prompt Engineering to Harness Engineering: The Next Evolution of LLM SystemsTowards AIAdvancing Responsible AI Adoption and Use in the Public Sector: Three Policy Priorities for State LegislationCenter for Democracy & TechnologyNvidia Might Have a Memory Problem, Analyst Says. What It Means for the Stock. - Barron'sGNews AI NVIDIABlack Hat USAAI BusinessBlack Hat AsiaAI BusinessConnecting MCP servers to Amazon Bedrock AgentCore Gateway using Authorization Code flowAWS Machine Learning BlogParsing the AI and gaming future with Nvidia’s Jensen Huang | GTC Q&A - GamesBeatGNews AI NVIDIAStartup Battlefield 200 applications open: A chance for VC access, TechCrunch coverage, and $100KTechCrunch Venture🔥 Jeffallan/claude-skillsGitHub Trending🔥 teng-lin/notebooklm-pyGitHub Trending🔥 HKUDS/DeepTutorGitHub TrendingNebius Stock Rises on $12B Meta AI Deal & Nvidia Investment | 2026 - News and Statistics - indexbox.ioGNews AI NVIDIAHow to use the new ChatGPT app integrations, including DoorDash, Spotify, Uber, and othersTechCrunch AINvidia’s AI Boom Faces Taiwan Supply Risk - NVIDIA (NASDAQ:NVDA), Taiwan Semiconductor (NYSE:TSM) - BenzingaGNews AI NVIDIAFrom Prompt Engineering to Harness Engineering: The Next Evolution of LLM SystemsTowards AIAdvancing Responsible AI Adoption and Use in the Public Sector: Three Policy Priorities for State LegislationCenter for Democracy & TechnologyNvidia Might Have a Memory Problem, Analyst Says. What It Means for the Stock. - Barron'sGNews AI NVIDIA
AI NEWS HUBbyEIGENVECTOREigenvector

CREval: An Automated Interpretable Evaluation for Creative Image Manipulation under Complex Instructions

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2603.26174v1 Announce Type: new Abstract: Instruction-based multimodal image manipulation has recently made rapid progress. However, existing evaluation methods lack a systematic and human-aligned framework for assessing model performance on complex and creative editing tasks. To address this gap, we propose CREval, a fully automated question-answer (QA)-based evaluation pipeline that overcomes the incompleteness and poor interpretability of opaque Multimodal Large Language Models (MLLMs) scoring. Simultaneously, we introduce CREval-Bench, a comprehensive benchmark specifically designed — Chonghuinan Wang, Zihan Chen, Yuxiang Wei, Tianyi Jiang, Xiaohe Wu, Fan Li, Wangmeng Zuo, Hongxun Yao

View PDF HTML (experimental)

Abstract:Instruction-based multimodal image manipulation has recently made rapid progress. However, existing evaluation methods lack a systematic and human-aligned framework for assessing model performance on complex and creative editing tasks. To address this gap, we propose CREval, a fully automated question-answer (QA)-based evaluation pipeline that overcomes the incompleteness and poor interpretability of opaque Multimodal Large Language Models (MLLMs) scoring. Simultaneously, we introduce CREval-Bench, a comprehensive benchmark specifically designed for creative image manipulation under complex instructions. CREval-Bench covers three categories and nine creative dimensions, comprising over 800 editing samples and 13K evaluation queries. Leveraging this pipeline and benchmark, we systematically evaluate a diverse set of state-of-the-art open and closed-source models. The results reveal that while closed-source models generally outperform open-source ones on complex and creative tasks, all models still struggle to complete such edits effectively. In addition, user studies demonstrate strong consistency between CREval's automated metrics and human judgments. Therefore, CREval provides a reliable foundation for evaluating image editing models on complex and creative image manipulation tasks, and highlights key challenges and opportunities for future research.

Comments: Accepted by CVPR2026

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.26174 [cs.CV]

(or arXiv:2603.26174v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.26174

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Chonghuinan Wang [view email] [v1] Fri, 27 Mar 2026 08:42:09 UTC (37,557 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
CREval: An …researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 266 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers