Provably Extracting the Features from a General Superposition
arXiv:2512.15987v2 Announce Type: replace-cross Abstract: It is widely believed that complex machine learning models generally encode features through linear representations. This is the foundational hypothesis behind a vast body of work on interpretability. A key challenge toward extracting interpretable features, however, is that they exist in superposition. In this work, we study the question of extracting features in superposition from a learning theoretic perspective. We start with the following fundamental setting: we are given query access to a function \[ f(x)=\sum_{i=1}^n \sigma_i(v_i^\top x), \] where each unit vector $v_i$ encodes a feature direction and $\sigma_i:\R\to\R$ is an arbitrary response function and our goal is to recover the $v_i$ and the function $f$. In learning-th
View PDF HTML (experimental)
Abstract:It is widely believed that complex machine learning models generally encode features through linear representations. This is the foundational hypothesis behind a vast body of work on interpretability. A key challenge toward extracting interpretable features, however, is that they exist in superposition. In this work, we study the question of extracting features in superposition from a learning theoretic perspective. We start with the following fundamental setting: we are given query access to a function [ f(x)=\sum_{i=1}^n \sigma_i(v_i^\top x), ] where each unit vector $v_i$ encodes a feature direction and $\sigma_i:\R\to\R$ is an arbitrary response function and our goal is to recover the $v_i$ and the function $f$. In learning-theoretic terms, superposition refers to the \emph{overcomplete regime}, when the number of features is larger than the underlying dimension (i.e. $n > d$), which has proven especially challenging for typical algorithmic approaches. Our main result is an efficient query algorithm that, from noisy oracle access to $f$, identifies all feature directions whose responses are non-degenerate and reconstructs the function $f$. Crucially, our algorithm works in a significantly more general setting than all related prior results. We allow for essentially arbitrary superpositions, only requiring that $v_i, v_j$ are not nearly identical for $i \neq j$, and allowing for general response functions $\sigma_i$. At a high level, our algorithm introduces an approach for searching in Fourier space by iteratively refining the search space to locate the hidden directions $v_i$._
Subjects:
Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
Cite as: arXiv:2512.15987 [cs.LG]
(or arXiv:2512.15987v2 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2512.15987
arXiv-issued DOI via DataCite
Submission history
From: Allen Liu [view email] [v1] Wed, 17 Dec 2025 21:42:32 UTC (40 KB) [v2] Tue, 31 Mar 2026 03:55:06 UTC (43 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelannouncefeatureKnowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Vector researchers presented more than 50 papers at ICML 2024
Vector researchers presented more than 50 papers at the 2024 International Conference on Machine Learning (ICML). 35 papers co-authored by Vector Faculty Members were accepted to the conference, with a [ ] The post Vector researchers presented more than 50 papers at ICML 2024 appeared first on Vector Institute for Artificial Intelligence .

Vector Researchers present papers at ACL 2024
Vector researchers will be well represented at the 62nd Annual Meeting of the Association for Computational Linguistics in Bangkok, Thailand this year. 14 papers co-authored by Vector-affiliated researchers are being [ ] The post Vector Researchers present papers at ACL 2024 appeared first on Vector Institute for Artificial Intelligence .



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!