MemGuard-Alpha: Detecting and Filtering Memorization-Contaminated Signals in LLM-Based Financial Forecasting via Membership Inference and Cross-Model Disagreement
arXiv:2603.26797v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to generate financial alpha signals, yet growing evidence shows that LLMs memorize historical financial data from their training corpora, producing spurious predictive accuracy that collapses out-of-sample. This memorization-induced look-ahead bias threatens the validity of LLM-based quantitative strategies. Prior remedies -- model retraining and input anonymization -- are either prohibitively expensive or introduce significant information loss. No existing method offers practical, zero-cost sign — Anisha Roy, Dip Roy
View PDF
Abstract:Large language models (LLMs) are increasingly used to generate financial alpha signals, yet growing evidence shows that LLMs memorize historical financial data from their training corpora, producing spurious predictive accuracy that collapses out-of-sample. This memorization-induced look-ahead bias threatens the validity of LLM-based quantitative strategies. Prior remedies -- model retraining and input anonymization -- are either prohibitively expensive or introduce significant information loss. No existing method offers practical, zero-cost signal-level filtering for real-time trading. We introduce MemGuard-Alpha, a post-generation framework comprising two algorithms: (i) the MemGuard Composite Score (MCS), which combines five membership inference attack (MIA) methods with temporal proximity features via logistic regression, achieving Cohen's d = 18.57 for contamination separation (d = 0.39-1.37 using MIA features alone); and (ii) Cross-Model Memorization Disagreement (CMMD), which exploits variation in training cutoff dates across LLMs to separate memorized signals from genuine reasoning. Evaluated across seven LLMs (124M-7B parameters), 50 S&P 100 stocks, 42,800 prompts, and five MIA methods over 5.5 years (2019-2024), CMMD achieves a Sharpe ratio of 4.11 versus 2.76 for unfiltered signals (49% improvement). Clean signals produce 14.48 bps average daily return versus 2.13 bps for tainted signals (7x difference). A striking crossover pattern emerges: in-sample accuracy rises with contamination (40.8% to 52.5%) while out-of-sample accuracy falls (47% to 42%), providing direct evidence that memorization inflates apparent accuracy at the cost of generalization.
Subjects:
Machine Learning (cs.LG)
Cite as: arXiv:2603.26797 [cs.LG]
(or arXiv:2603.26797v1 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2603.26797
arXiv-issued DOI via DataCite
Submission history
From: Dip Roy [view email] [v1] Thu, 26 Mar 2026 00:35:25 UTC (1,152 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivExclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxPdnA0SVIwQjktYkI3TUdZQWVHTXBDRWl6akZZOEhiVHVSZm53dkVoNEpEV0ZDOU1IUXBOVGZpNEVwUlRpaW1vbkwzTi1tcDJQMlliRUViWlNLaTQ1ak5vckdkWVdZTTBlMzM3bkRZbmM5LW42dTNKRkRBbGdmNmpWaVhDQXpSbzlDYTl4VE1jV2pIWGxQOXoxaWZ6SFBDU21sUmJKT2tmMjRjb1k0anBkLTRHbjFtbno5emtQaVNWUm1iZWF0UGJwZE9HZ29LWVUyVjdhdzA2cTF1R2NUY3J6bkJlUVhzYjVWZUZCdHdfbXJyX3lwRlJ6ak42MlJ3dUxTMEVpRHNGSmNfNi1GSmFmdTlkQUdCZEZvWlBBUjVYNTEtc0Y0ZFpkMGFKbTFFS3ZicjFYcllCMHV3YkJnZ2IxZkRTX1JiRlUzQkhjZzVYWlRUdVNfZGhqRWRWRmxyZTJJeHZ2T2RWQXR5aFZnMHgtdThweE5FdHNKOVZmOF9zMVdmb1djOWZxbFBkQ05lTndNLWZ6dFVYWXVudDZncGx6RllwcVJjVFRjUUdmOV9zOE9LYUgxTlR1eA?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxOQUF5ZjI0bHRwSEdoaXVWTnd6REQzYjByRUxfMndKT0RzS2RPaUktZ3BEVVVyNWczYzNUbjVxSzd5WGtwLTVuRnZlT3VZb0YyMk4zNFphZWNZbUh5WWhQY0ZSVWFFTTNXZXNXRTVibUpiRHBlaHhIeVNlQjhSZ29YZ1RVclkzS2p5cVhaWTFNSW5lU2o5VUNuUWwtNE1ObWEzT2RmRHZheE8zLW5HLU5rY0loeVdEM1dYRk02YlBLajdkZU5ZcEliR1ZzNWFvdFEwTEs1WEtVQS1aVUpBMmRncWJLS0ZKaGlSbTdQVmxfeXpIX3I2MGlJTDNuZE1OdVpPUWpzWXlfQkdUeHhGMnF2Y2FhakNDVjBYTFRqTXNJTjZXZ2JUSXZudjVremdZUDBMS24xN3lySEJTWmxOMWtWdUZhb1VHeUlQVFhnWnJtOFpGU052VVNiSENXNkdNbjdaVmZzbkI2MnpDMGNSZ2FzUVJ0WWFEWElEeDlCYzZZcHk5T3B6NGtYMmw5bTU5d3RVRGdmMnMxcE56T2o3cGhjeDBGTzJhUHVqdnp5ZVZ2MlZuQ3BCaDJXVg?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxQelRjVGJCMHI0NlB4NElDeGdDcFBoSlB1T0hSd2xlN2N6ME9SZFlSZWVmTXc0TG1uMHl2c2JBeldYdnUyVWIxQjFjRktwaFd3Y0UwVHQ3SC1Gdjc4eWtRT2VzR1luZndPZjVGeHdTQkFjZXJpSE5qcURnR0xvOF94QUhzVkpkaVd0QnJvZ3pxOC1idDlPWTA2Rnhydy1CNjRzdVMxbVlrbmJ6ZllERFNibktZMDZfRlM1WXRXQjBCdE42OHdMZTNNb09saDZhUXoxOVhCVE1TNGU0bURhVFI3ZFREQ3JRVEh6T09rZzhFTy1pZ1c3U01MRG1oVFJwS2lSUzRhY1JYMUtmdnZyN3hFT29DNWU2UU9EZkdNUTM1M2ZxR3JVNjRHM0ZyTjh1YTlaNGN0UTdmWUFhSHVPZ2RzS00ybjBhLXBkbmNuRldSR2otbVhNYURHNnE0MmplVUFKdzRoQS14MFEzc0ZSNUI4bkhURmlsbWludTVtT280bW1lX19TTFpFdEM5TDJnalk4bVp2RExTbkJmY09ITDBSRktCOTlyT2JidjNvTHlrYzJaNWlLaHdrLQ?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

UK police force presses pause on live facial recognition after study finds racial bias
<h4>Cams statistically more likely to ID Black people, says new research</h4> <p>A UK police force has suspended its deployment of live facial recognition (LFR) technology after a study revealed it was statistically more likely to identify Black people on a watchlist database.…</p>

Caltech breakthrough makes quantum memory last 30 times longer
While superconducting qubits are great at fast calculations, they struggle to store information for long periods. A team at Caltech has now developed a clever solution: converting quantum information into sound waves. By using a tiny device that acts like a miniature tuning fork, the researchers were able to extend quantum memory lifetimes up to 30 times longer than before. This breakthrough could pave the way toward practical, scalable quantum computers that can both compute and remember.

Too much screen time may be hurting kids’ hearts
More screen time among children and teens is linked to higher risks of heart and metabolic problems, particularly when combined with insufficient sleep. Danish researchers discovered a measurable rise in cardiometabolic risk scores and a metabolic “fingerprint” in frequent screen users. Experts say better sleep and balanced daily routines can help offset these effects and safeguard lifelong health.

Unbreakable? Researchers warn quantum computers have serious security flaws
Quantum computers could revolutionize everything from drug discovery to business analytics—but their incredible power also makes them surprisingly vulnerable. New research from Penn State warns that today’s quantum machines are not just futuristic tools, but potential gold mines for hackers. The study reveals that weaknesses can exist not only in software, but deep within the physical hardware itself, where valuable algorithms and sensitive data may be exposed.

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!