Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessI Renamed All 43 Tools in My MCP Server. Here's Why I Did It Now.Dev.to AIWhy AI Pilots Fail — And the 5 Patterns That Actually Get to ProductionDev.to AIBuilding Predictive Maintenance Systems for Infrastructure MonitoringDev.to AIThe Best Scribe Alternative in 2026 (Privacy-First, AI-Ready)Dev.to AII Started Building a Roguelike RPG — Powered by On-Device AI #2Dev.to AIGR4AD: Kuaishou's Production-Ready Generative Recommender for Ads Delivers 4.2% Revenue LiftDev.to AIFAOS Neurosymbolic Architecture Boosts Enterprise Agent Accuracy by 46% via Ontology-Constrained ReasoningDev.to AIOwn Your Data: The Wake-Up CallDev.to AIHow I Replaced 6 Paid AI Subscriptions With One Free Tool (Saved $86/Month)Dev.to AIClaude Code subagent patterns: how to break big tasks into bounded scopesDev.to AIIntercom Opens Fin to the World - The AI Economy | Ken YeungGNews AI RAGAnthropic says Claude subscriptions will no longer cover usage on third-party tools like OpenClaw starting April 4 at 12pm PT, to better manage capacity (Boris Cherny/@bcherny)TechmemeBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessI Renamed All 43 Tools in My MCP Server. Here's Why I Did It Now.Dev.to AIWhy AI Pilots Fail — And the 5 Patterns That Actually Get to ProductionDev.to AIBuilding Predictive Maintenance Systems for Infrastructure MonitoringDev.to AIThe Best Scribe Alternative in 2026 (Privacy-First, AI-Ready)Dev.to AII Started Building a Roguelike RPG — Powered by On-Device AI #2Dev.to AIGR4AD: Kuaishou's Production-Ready Generative Recommender for Ads Delivers 4.2% Revenue LiftDev.to AIFAOS Neurosymbolic Architecture Boosts Enterprise Agent Accuracy by 46% via Ontology-Constrained ReasoningDev.to AIOwn Your Data: The Wake-Up CallDev.to AIHow I Replaced 6 Paid AI Subscriptions With One Free Tool (Saved $86/Month)Dev.to AIClaude Code subagent patterns: how to break big tasks into bounded scopesDev.to AIIntercom Opens Fin to the World - The AI Economy | Ken YeungGNews AI RAGAnthropic says Claude subscriptions will no longer cover usage on third-party tools like OpenClaw starting April 4 at 12pm PT, to better manage capacity (Boris Cherny/@bcherny)Techmeme
AI NEWS HUBbyEIGENVECTOREigenvector

QPT V2: Masked Image Modeling Advances Visual Scoring

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2407.16541v2 Announce Type: replace Abstract: Quality assessment and aesthetics assessment aim to evaluate the perceived quality and aesthetics of visual content. Current learning-based methods suffer greatly from the scarcity of labeled data and usually perform sub-optimally in terms of generalization. Although masked image modeling (MIM) has achieved noteworthy advancements across various high-level tasks (e.g., classification, detection etc.). In this work, we take on a novel perspective to investigate its capabilities in terms of quality- and aesthetics-awareness. To this end, we pro — Qizhi Xie, Kun Yuan, Yunpeng Qu, Mingda Wu, Ming Sun, Chao Zhou, Jihong Zhu

View PDF HTML (experimental)

Abstract:Quality assessment and aesthetics assessment aim to evaluate the perceived quality and aesthetics of visual content. Current learning-based methods suffer greatly from the scarcity of labeled data and usually perform sub-optimally in terms of generalization. Although masked image modeling (MIM) has achieved noteworthy advancements across various high-level tasks (e.g., classification, detection etc.). In this work, we take on a novel perspective to investigate its capabilities in terms of quality- and aesthetics-awareness. To this end, we propose Quality- and aesthetics-aware pretraining (QPT V2), the first pretraining framework based on MIM that offers a unified solution to quality and aesthetics assessment. To perceive the high-level semantics and fine-grained details, pretraining data is curated. To comprehensively encompass quality- and aesthetics-related factors, degradation is introduced. To capture multi-scale quality and aesthetic information, model structure is modified. Extensive experimental results on 11 downstream benchmarks clearly show the superior performance of QPT V2 in comparison with current state-of-the-art approaches and other pretraining paradigms.

Comments: 8 pages, 6 figures. Accepted by ACM MM 24

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Cite as: arXiv:2407.16541 [cs.CV]

(or arXiv:2407.16541v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2407.16541

arXiv-issued DOI via DataCite

Submission history

From: Qizhi Xie [view email] [v1] Tue, 23 Jul 2024 14:53:47 UTC (6,312 KB) [v2] Fri, 27 Mar 2026 03:13:37 UTC (4,581 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
QPT V2: Mas…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 128 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers