Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessA suspected system failure caused a number of Baidu robotaxis to stop across Wuhan, trapping passengers and reportedly causing traffic disruptions and crashes (Zeyi Yang/Wired)TechmemeGrab, in partnership with WeRide, launches a robotaxi service in Singapore, becoming Southeast Asia's first ride-hailing provider to offer a driverless service (Bloomberg)TechmemeMichael Jordan, 63, credits one trait for making him great: 'It keeps me young'Business InsiderThe European Union's main institutions have banned staff from using fully AI-generated videos and images in official communications (Pieter Haeck/Politico)TechmemeThe Axios Supply Chain Attack Explained: How a Compromised npm Account Put 83 Million Projects at RiskDEV CommunityFrom Zero to Everything: The Story of My First ProjectDEV CommunityHow I Stopped Hallucinations in My AI Application Built on AWS BedrockDEV CommunityThe Agent Economy Needs Infrastructure, Not CustodyDEV CommunityBeyond Static RAG: Using 1958 Biochemistry to Beat Multi-Hop Retrieval by 14%DEV CommunityInside the Anthropic leak: 4 hidden Claude features that could redefine AI forever - Moneycontrol.comGoogle News: ClaudeWe Benchmarked Our SSR Framework Against Next.js — Here's What We FoundDEV CommunityOpenAI’s Secret Project to Train ChatGPT on 400+ Specialized Jobs - Startup FortuneGoogle News: ChatGPTBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessA suspected system failure caused a number of Baidu robotaxis to stop across Wuhan, trapping passengers and reportedly causing traffic disruptions and crashes (Zeyi Yang/Wired)TechmemeGrab, in partnership with WeRide, launches a robotaxi service in Singapore, becoming Southeast Asia's first ride-hailing provider to offer a driverless service (Bloomberg)TechmemeMichael Jordan, 63, credits one trait for making him great: 'It keeps me young'Business InsiderThe European Union's main institutions have banned staff from using fully AI-generated videos and images in official communications (Pieter Haeck/Politico)TechmemeThe Axios Supply Chain Attack Explained: How a Compromised npm Account Put 83 Million Projects at RiskDEV CommunityFrom Zero to Everything: The Story of My First ProjectDEV CommunityHow I Stopped Hallucinations in My AI Application Built on AWS BedrockDEV CommunityThe Agent Economy Needs Infrastructure, Not CustodyDEV CommunityBeyond Static RAG: Using 1958 Biochemistry to Beat Multi-Hop Retrieval by 14%DEV CommunityInside the Anthropic leak: 4 hidden Claude features that could redefine AI forever - Moneycontrol.comGoogle News: ClaudeWe Benchmarked Our SSR Framework Against Next.js — Here's What We FoundDEV CommunityOpenAI’s Secret Project to Train ChatGPT on 400+ Specialized Jobs - Startup FortuneGoogle News: ChatGPT

MemGuard-Alpha: Detecting and Filtering Memorization-Contaminated Signals in LLM-Based Financial Forecasting via Membership Inference and Cross-Model Disagreement

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.26797v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to generate financial alpha signals, yet growing evidence shows that LLMs memorize historical financial data from their training corpora, producing spurious predictive accuracy that collapses out-of-sample. This memorization-induced look-ahead bias threatens the validity of LLM-based quantitative strategies. Prior remedies -- model retraining and input anonymization -- are either prohibitively expensive or introduce significant information loss. No existing method offers practical, zero-cost sign — Anisha Roy, Dip Roy

View PDF

Abstract:Large language models (LLMs) are increasingly used to generate financial alpha signals, yet growing evidence shows that LLMs memorize historical financial data from their training corpora, producing spurious predictive accuracy that collapses out-of-sample. This memorization-induced look-ahead bias threatens the validity of LLM-based quantitative strategies. Prior remedies -- model retraining and input anonymization -- are either prohibitively expensive or introduce significant information loss. No existing method offers practical, zero-cost signal-level filtering for real-time trading. We introduce MemGuard-Alpha, a post-generation framework comprising two algorithms: (i) the MemGuard Composite Score (MCS), which combines five membership inference attack (MIA) methods with temporal proximity features via logistic regression, achieving Cohen's d = 18.57 for contamination separation (d = 0.39-1.37 using MIA features alone); and (ii) Cross-Model Memorization Disagreement (CMMD), which exploits variation in training cutoff dates across LLMs to separate memorized signals from genuine reasoning. Evaluated across seven LLMs (124M-7B parameters), 50 S&P 100 stocks, 42,800 prompts, and five MIA methods over 5.5 years (2019-2024), CMMD achieves a Sharpe ratio of 4.11 versus 2.76 for unfiltered signals (49% improvement). Clean signals produce 14.48 bps average daily return versus 2.13 bps for tainted signals (7x difference). A striking crossover pattern emerges: in-sample accuracy rises with contamination (40.8% to 52.5%) while out-of-sample accuracy falls (47% to 42%), providing direct evidence that memorization inflates apparent accuracy at the cost of generalization.

Subjects:

Machine Learning (cs.LG)

Cite as: arXiv:2603.26797 [cs.LG]

(or arXiv:2603.26797v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.26797

arXiv-issued DOI via DataCite

Submission history

From: Dip Roy [view email] [v1] Thu, 26 Mar 2026 00:35:25 UTC (1,152 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
MemGuard-Al…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 183 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers