Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessAfter a 23% Plunge in the First Quarter, Can Microsoft’s AI Story Continue? - NAI500GNews AI MicrosoftAI Video Generation Startup Runway Unveils $10 Mn VC Fund To Back Early-stage AI Startups: Report - bwdisrupt.comGNews AI startupsOracle layoffs: 12,000 jobs cut in India amid AI push, more layoffs likely - Storyboard18GNews AI IndiaIs Arista Networks (ANET) Becoming NVIDIA’s Go-To AI Network Spine or Just One Key Partner? - simplywall.stGNews AI NVIDIAZhipu's Stock Soars After Chinese AI Startup's Annual Revenue More Than Doubles - Yicai GlobalGNews AI ChinaAustralia signs AI MoU with Anthropic, flags data centre investment - W.MediaGNews AI AustraliaHong Kong hasn’t issued a single HKD stablecoin license after March targetCoinDesk AIBitcoin is closer to its 'buy zone' than it's been in three yearsCoinDesk AIRAG Web Browser: Give Your AI Real-Time Web Access Without HallucinationsDEV CommunityWhat Nobody Tells You About Building a Protocol for AI AgentsDEV CommunityHuawei highlights AI, HarmonyOS and auto momentum in 2025 annual report - TechNodeGNews AI HuaweiThe Evidence Is in the Phone. Most of It Never Makes It Into the Case.DEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessAfter a 23% Plunge in the First Quarter, Can Microsoft’s AI Story Continue? - NAI500GNews AI MicrosoftAI Video Generation Startup Runway Unveils $10 Mn VC Fund To Back Early-stage AI Startups: Report - bwdisrupt.comGNews AI startupsOracle layoffs: 12,000 jobs cut in India amid AI push, more layoffs likely - Storyboard18GNews AI IndiaIs Arista Networks (ANET) Becoming NVIDIA’s Go-To AI Network Spine or Just One Key Partner? - simplywall.stGNews AI NVIDIAZhipu's Stock Soars After Chinese AI Startup's Annual Revenue More Than Doubles - Yicai GlobalGNews AI ChinaAustralia signs AI MoU with Anthropic, flags data centre investment - W.MediaGNews AI AustraliaHong Kong hasn’t issued a single HKD stablecoin license after March targetCoinDesk AIBitcoin is closer to its 'buy zone' than it's been in three yearsCoinDesk AIRAG Web Browser: Give Your AI Real-Time Web Access Without HallucinationsDEV CommunityWhat Nobody Tells You About Building a Protocol for AI AgentsDEV CommunityHuawei highlights AI, HarmonyOS and auto momentum in 2025 annual report - TechNodeGNews AI HuaweiThe Evidence Is in the Phone. Most of It Never Makes It Into the Case.DEV Community

Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2510.03721v2 Announce Type: replace-cross Abstract: Vision-language models trained on large-scale multimodal datasets show strong demographic biases, but the role of training data in producing these biases remains unclear. A major barrier has been the lack of demographic annotations in web-scale datasets such as LAION-400M. We address this gap by creating person-centric annotations for the full dataset, including over 276 million bounding boxes, perceived gender and race/ethnicity labels, and automatically generated captions. These annotations are produced through validated automatic lab — Leander Girrbach, Stephan Alaniz, Genevieve Smith, Trevor Darrell, Zeynep Akata

View PDF HTML (experimental)

Abstract:Vision-language models trained on large-scale multimodal datasets show strong demographic biases, but the role of training data in producing these biases remains unclear. A major barrier has been the lack of demographic annotations in web-scale datasets such as LAION-400M. We address this gap by creating person-centric annotations for the full dataset, including over 276 million bounding boxes, perceived gender and race/ethnicity labels, and automatically generated captions. These annotations are produced through validated automatic labeling pipelines combining object detection, multimodal captioning, and finetuned classifiers. Using them, we uncover demographic imbalances and harmful associations, such as the disproportionate linking of men and individuals perceived as Black or Middle Eastern with crime-related and negative content. We also show that a linear fit predicts 60-70% of gender bias in CLIP and Stable Diffusion from direct co-occurrences in the data. Our resources establish the first large-scale empirical link between dataset composition and downstream model bias. Code is available at this https URL.

Comments: ICLR 2026

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)

Cite as: arXiv:2510.03721 [cs.CV]

(or arXiv:2510.03721v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2510.03721

arXiv-issued DOI via DataCite

Submission history

From: Leander Girrbach [view email] [v1] Sat, 4 Oct 2025 07:51:59 UTC (5,232 KB) [v2] Sun, 29 Mar 2026 11:37:56 UTC (3,566 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Person-Cent…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 108 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers