Low-Rank Compression of Pretrained Models via Randomized Subspace Iteration
arXiv:2604.02659v1 Announce Type: cross Abstract: The massive scale of pretrained models has made efficient compression essential for practical deployment. Low-rank decomposition based on the singular value decomposition (SVD) provides a principled approach for model reduction, but its exact computation is expensive for large weight matrices. Randomized alternatives such as randomized SVD (RSVD) improve efficiency, yet they can suffer from poor approximation quality when the singular value spectrum decays slowly, a regime commonly observed in modern pretrained models. In this work, we address this limitation from both theoretical and empirical perspectives. First, we establish a connection between low-rank approximation error and predictive performance by analyzing softmax perturbations, s
View PDF HTML (experimental)
Abstract:The massive scale of pretrained models has made efficient compression essential for practical deployment. Low-rank decomposition based on the singular value decomposition (SVD) provides a principled approach for model reduction, but its exact computation is expensive for large weight matrices. Randomized alternatives such as randomized SVD (RSVD) improve efficiency, yet they can suffer from poor approximation quality when the singular value spectrum decays slowly, a regime commonly observed in modern pretrained models. In this work, we address this limitation from both theoretical and empirical perspectives. First, we establish a connection between low-rank approximation error and predictive performance by analyzing softmax perturbations, showing that deviations in class probabilities are controlled by the spectral error of the compressed weights. Second, we demonstrate that RSVD is inadequate, and we propose randomized subspace iteration (RSI) as a more effective alternative. By incorporating multiple power iterations, RSI improves spectral separation and provides a controllable mechanism for enhancing approximation quality. We evaluate our approach on both convolutional networks and transformer-based architectures. Our results show that RSI achieves near-optimal approximation quality while outperforming RSVD in predictive accuracy under aggressive compression, enabling efficient model compression.
Comments: 13 pages
Subjects:
Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Cite as: arXiv:2604.02659 [cs.LG]
(or arXiv:2604.02659v1 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2604.02659
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Farhad Pourkamali-Anaraki [view email] [v1] Fri, 3 Apr 2026 02:47:03 UTC (645 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modeltransformerannounce
Contra The Usual Interpretation Of “The Whispering Earring”
Submission statement: This essay builds off arguments that I have come up with entirely by myself, as can be seen by viewing the comments in my profile. I freely disclose that I used Claude to help structure and format rougher drafts or to better compile scattered thoughts but I endorse every single claim made within. I also used GPT 5.4 Thinking for fact-checking, or at least to confirm that my understanding of neuroscience is on reasonable grounds. I do not believe either model did more than confirm that my memory was mostly reliable. The usual reading of The Whispering Earring is easy to state and hard to resist. Here is a magical device that gives uncannily good advice, slowly takes over ever more of the user's cognition, leaves them outwardly prosperous and beloved, and eventually rev

500 AI Demos at AZ Tech Week. Every One Hits the Same Scaling Ceiling.
Arizona Tech Week 2026 | April 6–12, Phoenix Walk the demo hall at Plug and Play AccelerateAZ on April 7. Talk to the Edge AI session speakers. Watch the live scaling demos. You will see remarkable technology — real-time inference at the edge, multi-agent coordination, distributed sensor networks, federated models running across hospital systems. And every single one of them hits the same wall. Not a compute wall. Not a data wall. An architecture wall — the kind that doesn't show up in a demo because demos don't run at N=10,000 nodes. The Wall Every AI Scaling Demo Will Hit Here is the constraint that no one on the demo floor will explain to you, because most of them haven't realized it yet: Every current approach to distributed AI intelligence requires either centralization or linear scal

AI Mastery Course in Telugu: Hands-On Training with Real Projects
Introduction Artificial Intelligence is a practical field that requires real-world experience. Simply learning theory is not enough. The AI Mastery Course in Telugu focuses on hands-on training with real projects, helping learners gain practical knowledge and confidence. Why Hands-On Learning is Important in AI AI involves solving real-world problems. Practical experience helps learners: Understand real-world applications Improve problem-solving skills Gain confidence in building models Types of Projects Included in the Course The course offers various projects such as: Machine learning prediction models Chatbot development Data analysis projects Automation tools Step-by-Step Project Learning Approach Understanding the Problem Identify the real-world problem to solve. Data Collection Gathe
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Contra The Usual Interpretation Of “The Whispering Earring”
Submission statement: This essay builds off arguments that I have come up with entirely by myself, as can be seen by viewing the comments in my profile. I freely disclose that I used Claude to help structure and format rougher drafts or to better compile scattered thoughts but I endorse every single claim made within. I also used GPT 5.4 Thinking for fact-checking, or at least to confirm that my understanding of neuroscience is on reasonable grounds. I do not believe either model did more than confirm that my memory was mostly reliable. The usual reading of The Whispering Earring is easy to state and hard to resist. Here is a magical device that gives uncannily good advice, slowly takes over ever more of the user's cognition, leaves them outwardly prosperous and beloved, and eventually rev

Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens
arXiv:2604.02608v1 Announce Type: new Abstract: Function vectors (FVs) -- mean-difference directions extracted from in-context learning demonstrations -- can steer large language model behavior when added to the residual stream. We hypothesized that FV steering failures reflect an absence of task-relevant information: the logit lens would fail alongside steering. We were wrong. In the most comprehensive cross-template FV transfer study to date - 4,032 pairs across 12 tasks, 6 models from 3 families (Llama-3.1-8B, Gemma-2-9B, Mistral-7B-v0.3; base and instruction-tuned), 8 templates per task - we find the opposite dissociation: FV steering succeeds even when the logit lens cannot decode the correct answer at any layer. This steerability-without-decodability pattern is universal: steering ex

VoxelCodeBench: Benchmarking 3D World Modeling Through Code Generation
arXiv:2604.02580v1 Announce Type: new Abstract: Evaluating code generation models for 3D spatial reasoning requires executing generated code in realistic environments and assessing outputs beyond surface-level correctness. We introduce a platform VoxelCode, for analyzing code generation capabilities for 3D understanding and environment creation. Our platform integrates natural language task specification, API-driven code execution in Unreal Engine, and a unified evaluation pipeline supporting both automated metrics and human assessment. To demonstrate its utility, we construct VoxelCodeBench, a benchmark of voxel manipulation tasks spanning three reasoning dimensions: symbolic interpretation, geometric construction, and artistic composition. Evaluating leading code generation models, we fi

ROMAN: A Multiscale Routing Operator for Convolutional Time Series Models
arXiv:2604.02577v1 Announce Type: new Abstract: We introduce ROMAN (ROuting Multiscale representAtioN), a deterministic operator for time series that maps temporal scale and coarse temporal position into an explicit channel structure while reducing sequence length. ROMAN builds an anti-aliased multiscale pyramid, extracts fixed-length windows from each scale, and stacks them as pseudochannels, yielding a compact representation on which standard convolutional classifiers can operate. In this way, ROMAN provides a simple mechanism to control the inductive bias of downstream models: it can reduce temporal invariance, make temporal pooling implicitly coarse-position-aware, and expose multiscale interactions through channel mixing, while often improving computational efficiency by shortening th

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!