Research Papers research paper arxiv machine-learning deep-learning

Context parroting: A simple but tough-to-beat baseline for foundation models in scientific machine learning

arXivby [Submitted on 16 May 2025 (v1), last revised 29 Mar 2026 (this version, v3)]March 31, 20262 min read1 views

arXiv:2505.11349v3 Announce Type: replace Abstract: Recent time-series foundation models exhibit strong abilities to predict physical systems. These abilities include zero-shot forecasting, in which a model forecasts future states of a system given only a short trajectory as context, without knowledge of the underlying physics. Here, we show that foundation models often forecast through a simple parroting strategy, and when they are not parroting they exhibit some shared failure modes such as converging to the mean. As a result, a naive context parroting model that copies directly from the con — Yuanzhao Zhang, William Gilpin

View PDF HTML (experimental)

Abstract:Recent time-series foundation models exhibit strong abilities to predict physical systems. These abilities include zero-shot forecasting, in which a model forecasts future states of a system given only a short trajectory as context, without knowledge of the underlying physics. Here, we show that foundation models often forecast through a simple parroting strategy, and when they are not parroting they exhibit some shared failure modes such as converging to the mean. As a result, a naive context parroting model that copies directly from the context scores higher than leading time-series foundation models on predicting a diverse range of dynamical systems, including low-dimensional chaos, turbulence, coupled oscillators, and electrocardiograms, at a tiny fraction of the computational cost. We draw a parallel between context parroting and induction heads, which explains recent works showing that large language models can often be repurposed for time series forecasting. Our dynamical systems perspective also ties the scaling between forecast accuracy and context length to the fractal dimension of the underlying chaotic attractor, providing insight into previously observed in-context neural scaling laws. By revealing the performance gaps and failure modes of current time-series foundation models, context parroting can guide the design of future foundation models and help identify in-context learning strategies beyond parroting.

Comments: International Conference on Learning Representations (ICLR 2026)

Subjects:

Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD); Computational Physics (physics.comp-ph)

Cite as: arXiv:2505.11349 [cs.LG]

(or arXiv:2505.11349v3 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2505.11349

arXiv-issued DOI via DataCite

Submission history

From: Yuanzhao Zhang [view email] [v1] Fri, 16 May 2025 15:14:47 UTC (753 KB) [v2] Thu, 18 Sep 2025 22:10:25 UTC (910 KB) [v3] Sun, 29 Mar 2026 21:01:24 UTC (2,943 KB)

Original source

arXiv

https://arxiv.org/abs/2505.11349

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Models

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models WSJ

Google News: LLM

1m3 days ago

Research PapersLive

Assessing Pause Thresholds for empirical Translation Process Research

arXiv:2604.01410v1 Announce Type: new Abstract: Text production (and translations) proceeds in the form of stretches of typing, interrupted by keystroke pauses. It is often assumed that fast typing reflects unchallenged/automated translation production while long(er) typing pauses are indicative of translation problems, hurdles or difficulties. Building on a long discussion concerning the determination of pause thresholds that separate automated from presumably reflective translation processes (O'Brien, 2006; Alves and Vale, 2009; Timarova et al., 2011; Dragsted and Carl, 2013; Lacruz et al., 2014; Kumpulainen, 2015; Heilmann and Neumann 2016), this paper compares three recent approaches for computing these pause thresholds, and suggest and evaluate a novel method for computing Production

arXiv cs.CL

1mabout 1 hour ago

ModelsLive

Friends and Grandmothers in Silico: Localizing Entity Cells in Language Models

arXiv:2604.01404v1 Announce Type: new Abstract: Language models can answer many entity-centric factual questions, but it remains unclear which internal mechanisms are involved in this process. We study this question across multiple language models. We localize entity-selective MLP neurons using templated prompts about each entity, and then validate them with causal interventions on PopQA-based QA examples. On a curated set of 200 entities drawn from PopQA, localized neurons concentrate in early layers. Negative ablation produces entity-specific amnesia, while controlled injection at a placeholder token improves answer retrieval relative to mean-entity and wrong-cell controls. For many entities, activating a single localized neuron is sufficient to recover entity-consistent predictions once

arXiv cs.CL

1mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 187 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

Assessing Pause Thresholds for empirical Translation Process Research

arXiv cs.CL

1mabout 1 hour ago

Research PapersLive

[R], 31 MILLIONS High frequency data, Light GBM worked perfectly

We just published a paper on predicting adverse selection in high-frequency crypto markets using LightGBM , and I wanted to share it here because the findings are directly relevant to anyone dealing high frequency data and machine learning The core problem we solved: Every market maker's nightmare — getting picked off by informed traders right before a big move. We built a model that flags those toxic seconds before they wreck you. The data: - 31,081,463 second-level observations of BTC/USDT perpetual futures on Bybit - February 2025 → February 2026 (381 raw daily files) - Strict walk-forward regime, zero lookahead bias The key results (this is the part that shocked us): Our TailScore metric — which combines predicted toxicity probability with predicted price move severity — flags the top

Reddit r/MachineLearning

2mabout 1 hour ago

Research PapersLive

[D] ACL 2026 Decision

ACL 2026 decision are soon to be published ( submitted by /u/007noob0071 [link] [comments]

Reddit r/MachineLearning

1mabout 2 hours ago

Research PapersLive

Science Is Not a Reading Problem

For decades, scientific progress depended on reading papers. Continue reading on Medium »

Medium AI

1mabout 1 hour ago