Open Source AI github trending open-source

🔥 LMCache/LMCache

GitHub Trendingby LMCacheApril 2, 20264 min read3 views

🧒Explain Like I'm 5Simple language

Hey there, little explorer! Imagine you have a super-duper smart robot friend, like a talking teddy bear, that knows everything!

Sometimes, when you ask your teddy bear lots of questions, it has to think really hard and remember things it already said. That takes time, right?

This new thing called LMCache is like a super-fast notebook for your teddy bear! It helps the teddy bear remember answers it already gave, or parts of answers, so it doesn't have to think from scratch every time.

This makes your smart teddy bear answer you much, much faster! It's like magic, making big computer brains super speedy! ✨

Supercharge Your LLM with the Fastest KV Cache Layer — Trending on GitHub today with 30 new stars.

Summary

LMCache is an LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios. By storing the KV caches of reusable texts all over the datacenter (including GPU, CPU, Disk and even S3) with a wide range of acceleration technqiue (zero cpu copy, NIXL, GDS and more). LMCache reuses the KV caches of any reused text (not necessarily prefix) in any serving engine instance. Thus, LMCache saves precious GPU cycles and reduces user response delay.

By combining LMCache with vLLM, developers achieve 3-10x delay savings and GPU cycle reduction in many LLM use cases, including multi-round QA and RAG.

LMCache is used, integrated, or referenced across a growing ecosystem of LLM serving platforms, infrastructure providers, and open-source projects:

For more details, please check our Ray Summit talk and technical report.

Features

🔥 Integration with vLLM v1 with the following features:

High performance CPU KVCache offloading Disaggregated prefill P2P KVCache sharing

Integration with SGLang for KV cache offloading
Storage support as follows:

CPU Disk NIXL

Installation support through pip and latest vLLM

Installation

To use LMCache, simply install lmcache from your package manager, e.g. pip:

pip install lmcache

Works on Linux NVIDIA GPU platform.

More detailed installation instructions are available in the docs, particularly if you are not using the latest stable version of vllm or using another serving engine with different dependencies. Any "undefined symbol" or torch mismatch versions can be resolved in the documentation.

Getting started

The best way to get started is to checkout the Quickstart Examples in the docs.

Documentation

Check out the LMCache documentation which is available online.

We also post regularly in LMCache blogs.

Examples

Go hands-on with our examples, demonstrating how to address different use cases with LMCache.

Interested in Connecting?

Fill out the interest form, sign up for our newsletter, join LMCache slack, or drop an email, and our team will reach out to you!

Community meeting

The community meeting Zoom Link for LMCache is hosted bi-weekly. All are welcome to join!

Meetings are held bi-weekly on: Tuesdays at 9:00 AM PT – Add to Google Calendar

We keep notes from each meeting on this document for summaries of standups, discussion, and action items.

Recordings of meetings are available on the YouTube LMCache channel.

Contributing

We welcome and value all contributions and collaborations. Please check out Contributing Guide on how to contribute.

We continually update [Onboarding] Welcoming contributors with good first issues!

Citation

If you use LMCache for your research, please cite our papers:

@inproceedings{liu2024cachegen,  title={Cachegen: Kv cache compression and streaming for fast large language model serving},  author={Liu, Yuhan and Li, Hanchen and Cheng, Yihua and Ray, Siddhant and Huang, Yuyang and Zhang, Qizheng and Du, Kuntai and Yao, Jiayi and Lu, Shan and Ananthanarayanan, Ganesh and others},  booktitle={Proceedings of the ACM SIGCOMM 2024 Conference},  pages={38--56},  year={2024} }

@inproceedings{liu2024cachegen,  title={Cachegen: Kv cache compression and streaming for fast large language model serving},  author={Liu, Yuhan and Li, Hanchen and Cheng, Yihua and Ray, Siddhant and Huang, Yuyang and Zhang, Qizheng and Du, Kuntai and Yao, Jiayi and Lu, Shan and Ananthanarayanan, Ganesh and others},  booktitle={Proceedings of the ACM SIGCOMM 2024 Conference},  pages={38--56},  year={2024} }

@article{cheng2024large, title={Do Large Language Models Need a Content Delivery Network?}, author={Cheng, Yihua and Du, Kuntai and Yao, Jiayi and Jiang, Junchen}, journal={arXiv preprint arXiv:2409.13761}, year={2024} }

@inproceedings{10.1145/3689031.3696098, author = {Yao, Jiayi and Li, Hanchen and Liu, Yuhan and Ray, Siddhant and Cheng, Yihua and Zhang, Qizheng and Du, Kuntai and Lu, Shan and Jiang, Junchen}, title = {CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion}, year = {2025}, url = {https://doi.org/10.1145/3689031.3696098}, doi = {10.1145/3689031.3696098}, booktitle = {Proceedings of the Twentieth European Conference on Computer Systems}, pages = {94–109}, }

@article{cheng2025lmcache, title={LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference}, author={Cheng, Yihua and Liu, Yuhan and Yao, Jiayi and An, Yuwei and Chen, Xiaokun and Feng, Shaoting and Huang, Yuyang and Shen, Samuel and Du, Kuntai and Jiang, Junchen}, journal={arXiv preprint arXiv:2510.09665}, year={2025} }`

Socials

Linkedin | Twitter | Youtube

License

The LMCache codebase is licensed under Apache License 2.0. See the LICENSE file for details.

Original source

GitHub Trending

https://github.com/LMCache/LMCache

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

githubtrendingopen-source

ProductsLive

AI Doesn't Fix Your Development Problems. It Accelerates Them.

I've watched the same failure pattern play out across every technology wave of my career. Team gets a new tool that promises to change everything. Productivity numbers go up. Everyone celebrates. Six months later, they're drowning in the same late-stage rework they were drowning in before. Just more of it, arriving faster. I saw it with CASE tools in the nineties. With offshore development in the 2000s. With Agile transformations in the 2010s. With DevOps automation in the 2020s. AI code generation is the most powerful version of this pattern I've ever seen. And most engineering organizations are walking straight into it. The Illusion Looks Like This Your team adopts GitHub Copilot or a similar tool. A developer asks it to implement a user authentication module. In forty seconds, it produc

DEV Community

7m19 minutes ago

ProductsLive

Why I Built Scenar.io - An AI-Powered DevOps Interview Practice Tool

Why I Built Scenar.io How It Started I was prepping for a Google SRE interview and struggling with the debugging portion. Not the knowledge - I knew the commands, I'd fixed real incidents at work. The problem was practicing under interview conditions: thinking out loud, explaining your reasoning, having someone challenge your approach. I started using Claude in the terminal to simulate it. I'd describe a scenario, ask it to act as a broken server, and practice talking through my debugging process. After a few weeks I realized I was spending more time setting up the prompts than actually practicing. I had this whole system - hidden server states, clue tracking, difficulty levels - and it hit me: this should just be a tool. I looked at what already existed. SadServers makes you type exact co

DEV Community

5m17 minutes ago

ProductsLive

OAuth 2.0 Flows Demystified: Authorization Code, PKCE, and Client Credentials

OAuth 2.0 Is Not Authentication OAuth 2.0 is an authorization framework. It answers: "Can application X access resource Y on behalf of user Z?" OpenID Connect (OIDC) layers authentication on top: "Who is this user?" Most developers use both without realizing it. The Four Flows 1. Authorization Code Flow (Web Apps) The standard flow for web applications with a backend. Browser → Your App → GitHub/Google ("Allow access?") → Your App (with code) → Exchange code for token // Step 1: Redirect user to provider app . get ( ' /auth/github ' , ( req , res ) => { const state = generateRandomString ( 16 ); // CSRF protection req . session . oauthState = state ; const params = new URLSearchParams ({ client_id : process . env . GITHUB_CLIENT_ID ! , redirect_uri : ` ${ process . env . APP_URL } /auth/gi

DEV Community

5m18 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 242 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Open Source AI

Open Source AIFresh

Built an open source memory layer for local AI agents, runs fully offline, no cloud needed

I built an open source memory layer for AI agents called Octopoda. Runs entirely locally, no cloud, no API keys, no external services. Everything stays on your machine. The problem is pretty simple. Agents forget everything between sessions. Every time you restart your agent it starts from scratch like you never talked to it. I kept building hacky workarounds for this so eventually I just built a proper solution. It gives your agents persistent memory that survives restarts and crashes, semantic search so they can find memories by meaning not just exact keys, loop detection that catches when an agent is stuck doing the same thing over and over, messaging between agents so they can actually coordinate, crash recovery with snapshots you can roll back to, version history on every memory so yo

Reddit r/LocalLLaMA

2mabout 3 hours ago

Open Source AIFresh

🔥 aaif-goose/goose

an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM — Trending on GitHub today with 1523 new stars.

GitHub Trending

2mabout 3 hours ago

Open Source AIFresh

Android Instrumentation Testing in Continuous Integration: Practices, Patterns, and Performance

arXiv:2604.03438v1 Announce Type: new Abstract: Android instrumentation tests (end-to-end tests that run on a device or emulator) can catch problems that simpler tests miss. However, running these tests automatically in continuous integration (CI) is often difficult because emulator setup is fragile and configurations tend to drift over time. We study how open-source Android apps run instrumentation tests in CI by analyzing 4,518 repositories that use CI (snapshot: Aug. 10, 2025). We examine CI workflow files, scripts, and build configurations to identify cases where device setup is defined in Gradle (e.g., Gradle Managed Devices). Our results answer three questions about adoption, evolution, and outcomes. First, only about one in ten repositories (481/4,518; 10.6%) run instrumentation tes

arXiv cs.SE

2mabout 9 hours ago

Open Source AIFresh

trunk/06cee8b2f9c6b2c10076efb3082adb7c2605a98c: [vllm hash update] update the pinned vllm hash (#179531)

This PR is auto-generated nightly by this action . Update the pinned vllm hash. Pull Request resolved: #179531 Approved by: https://github.com/pytorchbot

PyTorch Releases

1mabout 7 hours ago