Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessPerplexity launches Secure Intelligence Institute to advance AI security, privacy, and safety research - Moneycontrol.comGoogle News: AI SafetyAnthropic Source Code Leak Exposes AI Security Logic Before $350B IPO - startupfortune.comGoogle News: ClaudeBoy, 16, takes his own life after chilling ChatGPT question and 'farewell' texts - Daily StarGoogle News: ChatGPTGiving up on EA after 13 yearsLessWrong AIThe End of the "I Am Not a Robot" Box: Why Your Next Login Will Require 5 SquatsDEV CommunityInstagram DMs to Amazon Connect ChatDEV CommunityThe Nines Are Lying to You: What 99.9% Uptime Actually CostsDEV CommunityThe jury verdicts against Meta and YouTube recognized some platform design features as defective, distinct from what Section 230 was created to protect (Casey Newton/Platformer)TechmemeAnthropic code leak sparks renewed concerns over AI security and operational risks - CXO DigitalpulseGoogle News: AI SafetyBefore You Upgrade Hardware, Fix the SoftwareDEV Community2026년, Postman 버릴 때? Axios npm 공격 후 안전한 API 테스트 및 마이그레이션DEV CommunityAnthropic accidentally leaks part of Claude Code source - Latest news from AzerbaijanGoogle News: ClaudeBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessPerplexity launches Secure Intelligence Institute to advance AI security, privacy, and safety research - Moneycontrol.comGoogle News: AI SafetyAnthropic Source Code Leak Exposes AI Security Logic Before $350B IPO - startupfortune.comGoogle News: ClaudeBoy, 16, takes his own life after chilling ChatGPT question and 'farewell' texts - Daily StarGoogle News: ChatGPTGiving up on EA after 13 yearsLessWrong AIThe End of the "I Am Not a Robot" Box: Why Your Next Login Will Require 5 SquatsDEV CommunityInstagram DMs to Amazon Connect ChatDEV CommunityThe Nines Are Lying to You: What 99.9% Uptime Actually CostsDEV CommunityThe jury verdicts against Meta and YouTube recognized some platform design features as defective, distinct from what Section 230 was created to protect (Casey Newton/Platformer)TechmemeAnthropic code leak sparks renewed concerns over AI security and operational risks - CXO DigitalpulseGoogle News: AI SafetyBefore You Upgrade Hardware, Fix the SoftwareDEV Community2026년, Postman 버릴 때? Axios npm 공격 후 안전한 API 테스트 및 마이그레이션DEV CommunityAnthropic accidentally leaks part of Claude Code source - Latest news from AzerbaijanGoogle News: Claude

BabyVLM-V2: Toward Developmentally Grounded Pretraining and Benchmarking of Vision Foundation Models

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2512.10932v2 Announce Type: replace-cross Abstract: Early children's developmental trajectories set up a natural goal for sample-efficient pretraining of vision foundation models. We introduce BabyVLM-V2, a developmentally grounded framework for infant-inspired vision-language modeling that extensively improves upon BabyVLM-V1 through a longitudinal, multifaceted pretraining set, a versatile model, and, most importantly, DevCV Toolbox for cognitive evaluation. The pretraining set maximizes coverage while minimizing curation of a longitudinal, infant-centric audiovisual corpus, yielding v — Shengao Wang, Wenqi Wang, Zecheng Wang, Max Whitton, Michael Wakeham, Arjun Chandra, Joey Huang, Pengyue Zhu, Helen Chen, David Li, Jeffrey Li, Shawn Li, Andrew Zagula, Amy Zhao, Andrew Zhu, Sayaka Nakamura, Yuki Yamamoto, Jerry Jun Yokono, Aaron Mueller, Bryan A. Plummer, Kate Saenko, Venkatesh Saligrama, Boqing Gong

Authors:Shengao Wang, Wenqi Wang, Zecheng Wang, Max Whitton, Michael Wakeham, Arjun Chandra, Joey Huang, Pengyue Zhu, Helen Chen, David Li, Jeffrey Li, Shawn Li, Andrew Zagula, Amy Zhao, Andrew Zhu, Sayaka Nakamura, Yuki Yamamoto, Jerry Jun Yokono, Aaron Mueller, Bryan A. Plummer, Kate Saenko, Venkatesh Saligrama, Boqing Gong

View PDF HTML (experimental)

Abstract:Early children's developmental trajectories set up a natural goal for sample-efficient pretraining of vision foundation models. We introduce BabyVLM-V2, a developmentally grounded framework for infant-inspired vision-language modeling that extensively improves upon BabyVLM-V1 through a longitudinal, multifaceted pretraining set, a versatile model, and, most importantly, DevCV Toolbox for cognitive evaluation. The pretraining set maximizes coverage while minimizing curation of a longitudinal, infant-centric audiovisual corpus, yielding video-utterance, image-utterance, and multi-turn conversational data that mirror infant experiences. DevCV Toolbox adapts all vision-related measures of the recently released NIH Baby Toolbox into a benchmark suite of ten multimodal tasks, covering spatial reasoning, memory, and vocabulary understanding aligned with early children's capabilities. Experimental results show that a compact model pretrained from scratch can achieve competitive performance on DevCV Toolbox, outperforming GPT-4o on some tasks. We hope the principled, unified BabyVLM-V2 framework will accelerate research in developmentally plausible pretraining of vision foundation models.

Comments: Accepted to CVPR 2026 main track

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Cite as: arXiv:2512.10932 [cs.CV]

(or arXiv:2512.10932v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2512.10932

arXiv-issued DOI via DataCite

Submission history

From: Shengao Wang [view email] [v1] Thu, 11 Dec 2025 18:57:05 UTC (28,873 KB) [v2] Sun, 29 Mar 2026 20:20:57 UTC (28,990 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
BabyVLM-V2:…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 229 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers