MIT scientists investigate memorization risk in the age of clinical AI

MIT ML Newsby Alex Ouyang | Abdul Latif Jameel Clinic for Machine Learning in HealthJanuary 5, 20261 min read0 views

New research demonstrates how AI models can be tested to ensure they don’t cause harm by revealing anonymized patient health data.

What is patient privacy for? The Hippocratic Oath, thought to be one of the earliest and most widely known medical ethics texts in the world, reads: “Whatever I see or hear in the lives of my patients, whether in connection with my professional practice or not, which ought not to be spoken of outside, I will keep secret, as considering all such things to be private.”

As privacy becomes increasingly scarce in the age of data-hungry algorithms and cyberattacks, medicine is one of the few remaining domains where confidentiality remains central to practice, enabling patients to trust their physicians with sensitive information.

But a paper co-authored by MIT researchers investigates how artificial intelligence models trained on de-identified electronic health records (EHRs) can memorize patient-specific information. The work, which was recently presented at the 2025 Conference on Neural Information Processing Systems (NeurIPS), recommends a rigorous testing setup to ensure targeted prompts cannot reveal information, emphasizing that leakage must be evaluated in a health care context to determine whether it meaningfully compromises patient privacy.

Foundation models trained on EHRs should normally generalize knowledge to make better predictions, drawing upon many patient records. But in “memorization,” the model draws upon a singular patient record to deliver its output, potentially violating patient privacy. Notably, foundation models are already known to be prone to data leakage.

“Knowledge in these high-capacity models can be a resource for many communities, but adversarial attackers can prompt a model to extract information on training data,” says Sana Tonekaboni, a postdoc at the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard and first author of the paper. Given the risk that foundation models could also memorize private data, she notes, “this work is a step towards ensuring there are practical evaluation steps our community can take before releasing models.”

To conduct research on the potential risk EHR foundation models could pose in medicine, Tonekaboni approached MIT Associate Professor Marzyeh Ghassemi, who is a principal investigator at the Abdul Latif Jameel Clinic for Machine Learning in Health (Jameel Clinic) and a member of the Computer Science and Artificial Intelligence Lab. Ghassemi, a faculty member in the MIT Department of Electrical Engineering and Computer Science and Institute for Medical Engineering and Science, runs the Healthy ML group, which focuses on robust machine learning in health.

Just how much information does a bad actor need to expose sensitive data, and what are the risks associated with the leaked information? To assess this, the research team developed a series of tests that they hope will lay the groundwork for future privacy evaluations. These tests are designed to measure various types of uncertainty, and assess their practical risk to patients by measuring various tiers of attack possibility.

“We really tried to emphasize practicality here; if an attacker has to know the date and value of a dozen laboratory tests from your record in order to extract information, there is very little risk of harm. If I already have access to that level of protected source data, why would I need to attack a large foundation model for more?” says Ghassemi.

With the inevitable digitization of medical records, data breaches have become more commonplace. In the past 24 months, the U.S. Department of Health and Human Services has recorded 747 data breaches of health information affecting more than 500 individuals, with the majority categorized as hacking/IT incidents.

Patients with unique conditions are especially vulnerable, given how easy it is to pick them out. “Even with de-identified data, it depends on what sort of information you leak about the individual,” Tonekaboni says. “Once you identify them, you know a lot more.”

In their structured tests, the researchers found that the more information the attacker has about a particular patient, the more likely the model is to leak information. They demonstrated how to distinguish model generalization cases from patient-level memorization, to properly assess privacy risk.

The paper also emphasized that some leaks are more harmful than others. For instance, a model revealing a patient’s age or demographics could be characterized as a more benign leakage than the model revealing more sensitive information, like an HIV diagnosis or alcohol abuse.

The researchers note that patients with unique conditions are especially vulnerable given how easy it is to pick them out, which may require higher levels of protection. “Even with de-identified data, it really depends on what sort of information you leak about the individual,” Tonekaboni says. The researchers plan to expand the work to become more interdisciplinary, adding clinicians and privacy experts as well as legal experts.

“There’s a reason our health data is private,” Tonekaboni says. “There’s no reason for others to know about it.”

This work supported by the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard, Wallenberg AI, the Knut and Alice Wallenberg Foundation, the U.S. National Science Foundation (NSF), a Gordon and Betty Moore Foundation award, a Google Research Scholar award, and the AI2050 Program at Schmidt Sciences. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute.

Original source

MIT ML News

https://news.mit.edu/2026/mit-scientists-investigate-memorization-risk-clinical-ai-0105

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelresearch

Research Papers

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI - WSJ

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI WSJ

GNews AI manufacturing

1mabout 1 month ago

Analyst NewsFresh

The end of predictable storage economics and what that means for infrastructure planning

The enterprise storage market is currently experiencing unprecedented SSD price volatility driven by massive AI demand and multi-year capacity commitments from hyperscalers. Between Q2 2025 and Q1 2026, for instance, 30TB TLC SSD pricing increased by 257% (from $3,062 to $10,950), while HDD pricing remained relatively stable, increasing by 35%. The situation is challenging some fundamental, long-term assumptions about storage architecture strategy, particularly the collective experience that flash pricing declines over time. Until recently, it was a trend fully supported by the facts, and even factoring in cyclical variation, long-term cost curves have generally supported predictable cost-per-GB reductions. This generally solid predictability has underpinned everything from multi-year infr

CIO Magazine

4mabout 3 hours ago

ModelsLive

My biggest Issue with the Gemma-4 Models is the Massive KV Cache!!

I mean, I have 40GB of Vram and I still cannot fit the entire Unsloth Gemma-4-31B-it-UD-Q8 (35GB) even at 2K context size unless I quantize KV to Q4 with 2K context size? WTF? For comparison, I can fit the entire UD-Q8 Qwen3.5-27B at full context without KV quantization! If I have to run a Q4 Gemma-4-31B-it-UD with a Q8 KV cache, then I am better off just using Qwen3.5-27B. After all, the latter beats the former in basically all benchmarks. What's your experience with the Gemma-4 models so far? submitted by /u/Iory1998 [link] [comments]

Reddit r/LocalLLaMA

1mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 132 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsFresh

The True Price of Every ChatGPT Prompt Uncovered - Adventure Magazine

The True Price of Every ChatGPT Prompt Uncovered Adventure Magazine

GNews AI energy

1mabout 11 hours ago

ModelsLive

My biggest Issue with the Gemma-4 Models is the Massive KV Cache!!

Reddit r/LocalLLaMA

1mabout 1 hour ago

ModelsLive

DenseNet Paper Walkthrough: All Connected

When we try to train a very deep neural network model, one issue that we might encounter is the vanishing gradient problem. This is essentially a problem where the weight update of a model during training slows down or even stops, hence causing the model not to improve. When a network is very deep, the [ ] The post DenseNet Paper Walkthrough: All Connected appeared first on Towards Data Science .

Towards Data Science

23m26 minutes ago

Models

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models WSJ

GNews AI energy

1m3 days ago