Models model transformer announce perspective arxiv

Low-Rank Compression of Pretrained Models via Randomized Subspace Iteration

arXiv stat.MLby Farhad Pourkamali-AnarakiApril 6, 20261 min read0 views

arXiv:2604.02659v1 Announce Type: cross Abstract: The massive scale of pretrained models has made efficient compression essential for practical deployment. Low-rank decomposition based on the singular value decomposition (SVD) provides a principled approach for model reduction, but its exact computation is expensive for large weight matrices. Randomized alternatives such as randomized SVD (RSVD) improve efficiency, yet they can suffer from poor approximation quality when the singular value spectrum decays slowly, a regime commonly observed in modern pretrained models. In this work, we address this limitation from both theoretical and empirical perspectives. First, we establish a connection between low-rank approximation error and predictive performance by analyzing softmax perturbations, s

View PDF HTML (experimental)

Abstract:The massive scale of pretrained models has made efficient compression essential for practical deployment. Low-rank decomposition based on the singular value decomposition (SVD) provides a principled approach for model reduction, but its exact computation is expensive for large weight matrices. Randomized alternatives such as randomized SVD (RSVD) improve efficiency, yet they can suffer from poor approximation quality when the singular value spectrum decays slowly, a regime commonly observed in modern pretrained models. In this work, we address this limitation from both theoretical and empirical perspectives. First, we establish a connection between low-rank approximation error and predictive performance by analyzing softmax perturbations, showing that deviations in class probabilities are controlled by the spectral error of the compressed weights. Second, we demonstrate that RSVD is inadequate, and we propose randomized subspace iteration (RSI) as a more effective alternative. By incorporating multiple power iterations, RSI improves spectral separation and provides a controllable mechanism for enhancing approximation quality. We evaluate our approach on both convolutional networks and transformer-based architectures. Our results show that RSI achieves near-optimal approximation quality while outperforming RSVD in predictive accuracy under aggressive compression, enabling efficient model compression.

Comments: 13 pages

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Numerical Analysis (math.NA); Machine Learning (stat.ML)

Cite as: arXiv:2604.02659 [cs.LG]

(or arXiv:2604.02659v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2604.02659

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Farhad Pourkamali-Anaraki [view email] [v1] Fri, 3 Apr 2026 02:47:03 UTC (645 KB)

Original source

arXiv stat.ML

https://arxiv.org/abs/2604.02659

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modeltransformerannounce

ModelsLive

Contra The Usual Interpretation Of “The Whispering Earring”

Submission statement: This essay builds off arguments that I have come up with entirely by myself, as can be seen by viewing the comments in my profile. I freely disclose that I used Claude to help structure and format rougher drafts or to better compile scattered thoughts but I endorse every single claim made within. I also used GPT 5.4 Thinking for fact-checking, or at least to confirm that my understanding of neuroscience is on reasonable grounds. I do not believe either model did more than confirm that my memory was mostly reliable. The usual reading of The Whispering Earring is easy to state and hard to resist. Here is a magical device that gives uncannily good advice, slowly takes over ever more of the user's cognition, leaves them outwardly prosperous and beloved, and eventually rev

LessWrong AI

16m28 minutes ago

ProductsLive

500 AI Demos at AZ Tech Week. Every One Hits the Same Scaling Ceiling.

Arizona Tech Week 2026 | April 6–12, Phoenix Walk the demo hall at Plug and Play AccelerateAZ on April 7. Talk to the Edge AI session speakers. Watch the live scaling demos. You will see remarkable technology — real-time inference at the edge, multi-agent coordination, distributed sensor networks, federated models running across hospital systems. And every single one of them hits the same wall. Not a compute wall. Not a data wall. An architecture wall — the kind that doesn't show up in a demo because demos don't run at N=10,000 nodes. The Wall Every AI Scaling Demo Will Hit Here is the constraint that no one on the demo floor will explain to you, because most of them haven't realized it yet: Every current approach to distributed AI intelligence requires either centralization or linear scal

Dev.to AI

8m19 minutes ago

ProductsLive

AI Mastery Course in Telugu: Hands-On Training with Real Projects

Introduction Artificial Intelligence is a practical field that requires real-world experience. Simply learning theory is not enough. The AI Mastery Course in Telugu focuses on hands-on training with real projects, helping learners gain practical knowledge and confidence. Why Hands-On Learning is Important in AI AI involves solving real-world problems. Practical experience helps learners: Understand real-world applications Improve problem-solving skills Gain confidence in building models Types of Projects Included in the Course The course offers various projects such as: Machine learning prediction models Chatbot development Data analysis projects Automation tools Step-by-Step Project Learning Approach Understanding the Problem Identify the real-world problem to solve. Data Collection Gathe

Dev.to AI

1m8 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 251 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsLive

Contra The Usual Interpretation Of “The Whispering Earring”

LessWrong AI

16m28 minutes ago

ModelsFresh

Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens

arXiv:2604.02608v1 Announce Type: new Abstract: Function vectors (FVs) -- mean-difference directions extracted from in-context learning demonstrations -- can steer large language model behavior when added to the residual stream. We hypothesized that FV steering failures reflect an absence of task-relevant information: the logit lens would fail alongside steering. We were wrong. In the most comprehensive cross-template FV transfer study to date - 4,032 pairs across 12 tasks, 6 models from 3 families (Llama-3.1-8B, Gemma-2-9B, Mistral-7B-v0.3; base and instruction-tuned), 8 templates per task - we find the opposite dissociation: FV steering succeeds even when the logit lens cannot decode the correct answer at any layer. This steerability-without-decodability pattern is universal: steering ex

arXiv cs.LG

2mabout 3 hours ago

ModelsFresh

VoxelCodeBench: Benchmarking 3D World Modeling Through Code Generation

arXiv:2604.02580v1 Announce Type: new Abstract: Evaluating code generation models for 3D spatial reasoning requires executing generated code in realistic environments and assessing outputs beyond surface-level correctness. We introduce a platform VoxelCode, for analyzing code generation capabilities for 3D understanding and environment creation. Our platform integrates natural language task specification, API-driven code execution in Unreal Engine, and a unified evaluation pipeline supporting both automated metrics and human assessment. To demonstrate its utility, we construct VoxelCodeBench, a benchmark of voxel manipulation tasks spanning three reasoning dimensions: symbolic interpretation, geometric construction, and artistic composition. Evaluating leading code generation models, we fi

arXiv cs.LG

1mabout 3 hours ago

ModelsFresh

ROMAN: A Multiscale Routing Operator for Convolutional Time Series Models

arXiv:2604.02577v1 Announce Type: new Abstract: We introduce ROMAN (ROuting Multiscale representAtioN), a deterministic operator for time series that maps temporal scale and coarse temporal position into an explicit channel structure while reducing sequence length. ROMAN builds an anti-aliased multiscale pyramid, extracts fixed-length windows from each scale, and stacks them as pseudochannels, yielding a compact representation on which standard convolutional classifiers can operate. In this way, ROMAN provides a simple mechanism to control the inductive bias of downstream models: it can reduce temporal invariance, make temporal pooling implicitly coarse-position-aware, and expose multiscale interactions through channel mixing, while often improving computational efficiency by shortening th

arXiv cs.LG

2mabout 3 hours ago