Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxQamNrT0NoNFYxYTFFMUhzbWtzSVNVSHBQWVVPQ0ZOR3o1bTJTNFVuMlJLVDhXaHNhRDZGZWZjUUgtcU01WkJSM0hYSWVCQmV3ZllKbjVWcDBuX0pheTQ3QThDc254NElERTZqeXZRdE43UkJaYVlRUk03NDlMa1RlU2NUb2N6c25UMlFwQWhITVo3M3dLV1JNblZvcUxuTV9YUE04S2ZCRGFLaWZhQlNXdzdQd0dRLW5va0YzVjVkY0hyV1NyaDRLOVo2UEFxcTk5QWZhTFduUXZLUVdXN1hkbTFGeWx3YVJUMzR6eThmaExhajJOSTRIY0p0Tmt1Yy0zbE9nREJGV1hLM2xPcGpPd1RaRmFTaGZ4M09HanRGSnJSVF9yN0laang2Ui1fWTNuZjZRWEdseVNXelc3Q0d4eW82SG9DcXg2cW1UQ3pEbmYtdnZ5ZFd5ajhFV2ZWX0dSODlVZEVRVEh2LWgzTDNkRDBlNWI4U1p3cE0xQVJUY1ZKMTUyRnBDQ3RlOTg0X1J6M2hSMW9Da3hlTG54dzJ5cTRIMG5KUG45Q1c4TW41UW5XQzZwQmxmRg?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
Could not retrieve the full article text.
Read on Google News: LLM →Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelresearch
SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression
arXiv:2604.03258v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated impressive capabilities across various tasks, but the billion-scale parameters pose deployment challenges. Although existing methods attempt to reduce the scale of LLMs, they require either special hardware support or expensive post-training to maintain model quality. To facilitate efficient and affordable model slimming, we propose a novel training-free compression method for LLMs, named "SoLA", which leverages \textbf{So}ft activation sparsity and \textbf{L}ow-r\textbf{A}nk decomposition. SoLA can identify and retain a minority of components significantly contributing to inference, while compressing the majority through low-rank decomposition, based on our analysis of the activation pattern in

Robust LLM Performance Certification via Constrained Maximum Likelihood Estimation
arXiv:2604.03257v1 Announce Type: new Abstract: The ability to rigorously estimate the failure rates of large language models (LLMs) is a prerequisite for their safe deployment. Currently, however, practitioners often face a tradeoff between expensive human gold standards and potentially severely-biased automatic annotation schemes such as "LLM-as-a-Judge" labeling. In this paper, we propose a new, practical, and efficient approach to LLM failure rate estimation based on constrained maximum-likelihood estimation (MLE). Our method integrates three distinct signal sources: (i) a small, high-quality human-labeled calibration set, (ii) a large corpus of LLM-judge annotations, and, most importantly, (iii) additional side information via domain-specific constraints derived from known bounds on j

Self-Execution Simulation Improves Coding Models
arXiv:2604.03253v1 Announce Type: new Abstract: A promising research direction in enabling LLMs to generate consistently correct code involves addressing their inability to properly estimate program execution, particularly for code they generate. In this work, we demonstrate that Code LLMs can be trained to simulate program execution in a step-by-step manner and that this capability can be leveraged to improve competitive programming performance. Our approach combines supervised fine-tuning on natural language execution traces, textual explanations grounded in true execution, with reinforcement learning using verifiable rewards. We introduce two complementary objectives: output prediction given code and inputs, and solving competitive programming tasks with either ground-truth or self-pred
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression
arXiv:2604.03258v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated impressive capabilities across various tasks, but the billion-scale parameters pose deployment challenges. Although existing methods attempt to reduce the scale of LLMs, they require either special hardware support or expensive post-training to maintain model quality. To facilitate efficient and affordable model slimming, we propose a novel training-free compression method for LLMs, named "SoLA", which leverages \textbf{So}ft activation sparsity and \textbf{L}ow-r\textbf{A}nk decomposition. SoLA can identify and retain a minority of components significantly contributing to inference, while compressing the majority through low-rank decomposition, based on our analysis of the activation pattern in

Robust LLM Performance Certification via Constrained Maximum Likelihood Estimation
arXiv:2604.03257v1 Announce Type: new Abstract: The ability to rigorously estimate the failure rates of large language models (LLMs) is a prerequisite for their safe deployment. Currently, however, practitioners often face a tradeoff between expensive human gold standards and potentially severely-biased automatic annotation schemes such as "LLM-as-a-Judge" labeling. In this paper, we propose a new, practical, and efficient approach to LLM failure rate estimation based on constrained maximum-likelihood estimation (MLE). Our method integrates three distinct signal sources: (i) a small, high-quality human-labeled calibration set, (ii) a large corpus of LLM-judge annotations, and, most importantly, (iii) additional side information via domain-specific constraints derived from known bounds on j

Self-Execution Simulation Improves Coding Models
arXiv:2604.03253v1 Announce Type: new Abstract: A promising research direction in enabling LLMs to generate consistently correct code involves addressing their inability to properly estimate program execution, particularly for code they generate. In this work, we demonstrate that Code LLMs can be trained to simulate program execution in a step-by-step manner and that this capability can be leveraged to improve competitive programming performance. Our approach combines supervised fine-tuning on natural language execution traces, textual explanations grounded in true execution, with reinforcement learning using verifiable rewards. We introduce two complementary objectives: output prediction given code and inputs, and solving competitive programming tasks with either ground-truth or self-pred

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!