Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessClaude Now Has 1 Million Token Context. Here’s What That Actually Means for Developers.Medium AIWhy EHR Data Doesn't Fit Neat ML TablesHackernoon AIAI can write code. It just can’t maintain it — About the future of creative workMedium AIMinimax 2.7: Today marks 14 days since the post on X and 12 since huggingface on openweightReddit r/LocalLLaMAMengapa “Smart City” Saja Tidak Cukup: Urgensi Deep Learning Spasiotemporal untuk Pelayanan PublikMedium AIAI for Frontend Developers — Day 18Medium AIThe Discipline of Not Fooling Ourselves: Episode 4 — The Interpreters of the RulesDEV CommunityHow We Used AI Agents to Security-Audit an Open Source ProjectDEV CommunityAI chatbot traffic grows seven times faster than social media but still trails by a factor of fourThe DecoderWhy We Ditched Bedrock Agents for Nova Pro and Built a Custom OrchestratorDEV CommunityStop leaking your .env to AI! I built a Rust/Tauri Secret Manager to inject API keys safely 🛡️DEV CommunityNevaMind AI: Advanced Memory for Proactive AgentsDEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessClaude Now Has 1 Million Token Context. Here’s What That Actually Means for Developers.Medium AIWhy EHR Data Doesn't Fit Neat ML TablesHackernoon AIAI can write code. It just can’t maintain it — About the future of creative workMedium AIMinimax 2.7: Today marks 14 days since the post on X and 12 since huggingface on openweightReddit r/LocalLLaMAMengapa “Smart City” Saja Tidak Cukup: Urgensi Deep Learning Spasiotemporal untuk Pelayanan PublikMedium AIAI for Frontend Developers — Day 18Medium AIThe Discipline of Not Fooling Ourselves: Episode 4 — The Interpreters of the RulesDEV CommunityHow We Used AI Agents to Security-Audit an Open Source ProjectDEV CommunityAI chatbot traffic grows seven times faster than social media but still trails by a factor of fourThe DecoderWhy We Ditched Bedrock Agents for Nova Pro and Built a Custom OrchestratorDEV CommunityStop leaking your .env to AI! I built a Rust/Tauri Secret Manager to inject API keys safely 🛡️DEV CommunityNevaMind AI: Advanced Memory for Proactive AgentsDEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

RSR-core: A High-Performance Engine for Low-Bit Matrix-Vector Multiplication

arXivby [Submitted on 29 Mar 2026]March 31, 20262 min read1 views
Source Quiz

arXiv:2603.27462v1 Announce Type: cross Abstract: Matrix-vector multiplication is a fundamental building block in neural networks, vector databases, and large language models, particularly during inference. As a result, efficient matrix-vector multiplication engines directly translate into more efficient inference. Recent work has explored low-bit quantization of model weights, where matrices are represented using binary (1-bit) or ternary (1.58-bit) values while activation is kept in higher precision. These representations enable efficient hardware-level computation. In parallel, algorithms s — Mohsen Dehghankar, Abolfazl Asudeh

View PDF HTML (experimental)

Abstract:Matrix-vector multiplication is a fundamental building block in neural networks, vector databases, and large language models, particularly during inference. As a result, efficient matrix-vector multiplication engines directly translate into more efficient inference. Recent work has explored low-bit quantization of model weights, where matrices are represented using binary (1-bit) or ternary (1.58-bit) values while activation is kept in higher precision. These representations enable efficient hardware-level computation. In parallel, algorithms such as Redundant Segment Reduction (RSR) provide theoretical guarantees for accelerating low-bit matrix-vector multiplication. However, existing implementations operate at the application level and cannot be efficiently integrated into hardware kernels, limiting practical performance. To bridge this gap, we present RSR-core, a high-performance engine that implements the RSR algorithm as optimized low-level kernels for both CPU and CUDA environments. RSR-core supports efficient matrix-vector multiplication for binary and ternary weight matrices and general vectors while enabling practical deployment of RSR algorithm in real inference pipelines. RSR-core is provided as a production-ready engine with HuggingFace integration for preprocessing low-bit models and running accelerated inference. Experimental results demonstrate significant performance improvements over baseline HuggingFace PyTorch multiplication, achieving up to 62x speedup on CPU and up to 1.9x speedup for token generation on CUDA for popular ternary LLMs. The source code is publicly available at this https URL.

Subjects:

Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Performance (cs.PF)

Cite as: arXiv:2603.27462 [cs.DS]

(or arXiv:2603.27462v1 [cs.DS] for this version)

https://doi.org/10.48550/arXiv.2603.27462

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Mohsen Dehghankar [view email] [v1] Sun, 29 Mar 2026 00:55:14 UTC (7,222 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
RSR-core: A…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 189 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers