Models model language model benchmark training announce feature

Baby Scale: Investigating Models Trained on Individual Children's Language Input

arXiv cs.CLby Steven Y. Feng, Alvin W. M. Tan, Michael C. FrankApril 1, 20262 min read0 views

arXiv:2603.29522v1 Announce Type: new Abstract: Modern language models (LMs) must be trained on many orders of magnitude more words of training data than human children receive before they begin to produce useful behavior. Assessing the nature and origins of this "data gap" requires benchmarking LMs on human-scale datasets to understand how linguistic knowledge emerges from children's natural training data. Using transcripts from the BabyView dataset (videos from children ages 6-36 months), we investigate (1) scaling performance at child-scale data regimes, (2) variability in model performance across datasets from different children's experiences and linguistic predictors of dataset quality, and (3) relationships between model and child language learning outcomes. LMs trained on child data

View PDF HTML (experimental)

Abstract:Modern language models (LMs) must be trained on many orders of magnitude more words of training data than human children receive before they begin to produce useful behavior. Assessing the nature and origins of this "data gap" requires benchmarking LMs on human-scale datasets to understand how linguistic knowledge emerges from children's natural training data. Using transcripts from the BabyView dataset (videos from children ages 6-36 months), we investigate (1) scaling performance at child-scale data regimes, (2) variability in model performance across datasets from different children's experiences and linguistic predictors of dataset quality, and (3) relationships between model and child language learning outcomes. LMs trained on child data show acceptable scaling for grammar tasks, but lower scaling on semantic and world knowledge tasks than models trained on synthetic data; we also observe substantial variability on data from different children. Beyond dataset size, performance is most associated with a combination of distributional and interactional linguistic features, broadly consistent with what makes high-quality input for child language development. Finally, model likelihoods for individual words correlate with children's learning of those words, suggesting that properties of child-directed input may influence both model learning and human language development. Overall, understanding what properties make language data efficient for learning can enable more powerful small-scale language models while also shedding light on human language acquisition.

Comments: Code and data at this https URL

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Cite as: arXiv:2603.29522 [cs.CL]

(or arXiv:2603.29522v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.29522

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Steven Y. Feng [view email] [v1] Tue, 31 Mar 2026 10:06:24 UTC (1,087 KB)

Original source

arXiv cs.CL

https://arxiv.org/abs/2603.29522

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelbenchmark

Market NewsFresh

The Self Driving Portfolio: Agentic Architecture for Institutional Asset Management

arXiv:2604.02279v1 Announce Type: cross Abstract: Agentic AI shifts the investor's role from analytical execution to oversight. We present an agentic strategic asset allocation pipeline in which approximately 50 specialized agents produce capital market assumptions, construct portfolios using over 20 competing methods, and critique and vote on each other's output. A researcher agent proposes new portfolio construction methods not yet represented, and a meta-agent compares past forecasts against realized returns and rewrites agent code and prompts to improve future performance. The entire pipeline is governed by the Investment Policy Statement--the same document that guides human portfolio managers can now constrain and direct autonomous agents.

arXiv cs.MA

1mabout 3 hours ago

ModelsFresh

AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance

arXiv:2506.03828v3 Announce Type: replace-cross Abstract: AI for Industrial Asset Lifecycle Management aims to automate complex operational workflows, such as condition monitoring and maintenance scheduling, to minimize system downtime. While traditional AI/ML approaches solve narrow tasks in isolation, Large Language Model (LLM) agents offer a next-generation opportunity for end-to-end automation. In this paper, we introduce AssetOpsBench, a unified framework for orchestrating and evaluating domain-specific agents for Industry 4.0. AssetOpsBench provides a multimodal ecosystem comprising a catalog of four domain-specific agents, a curated dataset of 140+ human-authored natural-language queries grounded in real industrial scenarios, and a simulated, CouchDB-backed IoT environment. We intro

arXiv cs.MA

1mabout 3 hours ago

ModelsFresh

PRO-SPECT: Probabilistically Safe Scalable Planning for Energy-Aware Coordinated UAV-UGV Teams in Stochastic Environments

arXiv:2604.02142v1 Announce Type: cross Abstract: We consider energy-aware planning for an unmanned aerial vehicle (UAV) and unmanned ground vehicle (UGV) team operating in a stochastic environment. The UAV must visit a set of air points in minimum time while respecting energy constraints, relying on the UGV as a mobile charging station. Unlike prior work that assumed deterministic travel times or used fixed robustness margins, we model travel times as random variables and bound the probability of failure (energy depletion) across the entire mission to a user-specified risk level. We formulate the problem as a Mixed-Integer Program and propose PRO-SPECT, a polynomial-time algorithm that generates risk-bounded plans. The algorithm supports both offline planning and online re-planning, enabl

arXiv cs.MA

1mabout 3 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 202 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

Baby Scale: Investigating Models Trained on Individual Children's Language Input

Submission history

Daily AI Digest

More about

The Self Driving Portfolio: Agentic Architecture for Institutional Asset Management

AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance

PRO-SPECT: Probabilistically Safe Scalable Planning for Energy-Aware Coordinated UAV-UGV Teams in Stochastic Environments

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Models

Mistral AI Lands Accenture as Latest Big Client - WSJ

Mistral AI Raises $830 Million in Debt For Nvidia-Powered Data Center - WSJ

iOS 27: Apple will reportedly let Claude and other AI chatbot apps integrate with Siri - 9to5Mac

AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance