Lemonade by AMD: a fast and open source local LLM server using GPU and NPU
Comments
Refreshingly fast
on GPUs and NPUs
Open source. Private. Ready in minutes on any PC.
Chat
What can I do with 128 GB of unified RAM?
Load up models like gpt-oss-120b or Qwen-Coder-Next for advanced tool use.
What should I tune first?
You can use --no-mmap to speed up load times and increase context size to 64 or more.
Image Generation
A pitcher of lemonade in the style of a renaissance painting
Speech
Hello, I am your AI assistant. What can I do for you today?
Open Source
Built by the local AI community for every PC.
Lemonade exists because local AI should be free, open, fast, and private.
Ecosystem
Works with great apps.
Lemonade is integrated in many apps and works out-of-box with hundreds more thanks to the OpenAI API standard.
Tech Specs
Built for practical local AI workflows.
Everything from install to runtime is optimized for fast setup, broad compatibility, and local-first execution.
Native C++ Backend
Lightweight service that is only 2MB.
One Minute Install
Simple installer that sets up the stack automatically.
OpenAI API Compatible
Works with hundreds of apps out-of-box and integrates in minutes.
Auto-configures for your hardware
Configures dependencies for your GPU and NPU.
Multi-engine compatibility
Works with llama.cpp, Ryzen AI SW, FastFlowLM, and more.
Multiple Models at Once
Run more than one model at the same time.
Cross-platform
A consistent experience across Windows, Linux, and macOS (beta).
Built-in app
A GUI that lets you download, try, and switch models quickly.
Unified API
One local service for every modality.
Point your app at Lemonade and get chat, vision, image gen, transcription, speech gen, and more with standard APIs.
POST /api/v1/chat/completions
``
Latest Release
Always improving.
Track the newest improvements and highlights from the Lemonade release stream.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
open source
The AI Stack: A Practical Guide to Building Your Own Intelligent Applications
Beyond the Hype: What Does "Building with AI" Actually Mean? Another week, another wave of AI headlines. From speculative leaks to existential debates, the conversation often orbits the sensational. But for developers, the real story is happening in the trenches: the practical, stack-by-stack integration of intelligence into real applications. While the industry debates "how it happened," we're busy figuring out how to use it . Forget the monolithic "AI" label for a moment. Modern AI application development is less about creating a sentient being and more about strategically assembling a set of powerful, specialized tools. It's about choosing the right component for the job—be it generating text, analyzing images, or making predictions—and wiring it into your existing systems. This guide b

How I Discovered the Hidden Cost of "Lightweight" Python Packages
The "It's Just a Small Library" Trap We've all been there. You find a Python package that promises to solve your problem with minimal overhead. The README says "lightweight," the GitHub stars look good, and the developer swears it's "just a few kilobytes." So you install it, run your project, and wonder why your Docker image grew by 200MB. What happened? The package is small. But its dependencies aren't. And those dependencies have dependencies. And those... you get the idea. The Moment I Realized Something Was Missing I was comparing HTTP libraries for a new project. requests is popular, but everyone says it's "heavy." Then I found a library that claimed to be a "lightweight alternative." But something in my gut said "let me check." So I built pip-size — a tool that calculates the real do
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Semantic matching in graph space without matrix computation and hallucinations and no GPU
Hello AI community,For the past few months, I’ve been rethinking how AI should process language and logic. Instead of relying on heavy matrix multiplications (Attention mechanisms) to statistically guess the next word inside an unexplainable black box, I asked a different question: What if concepts existed in a physical, multi-dimensional graph space where logic is visually traceable?I am excited to share our experimental architecture. To be absolutely clear: this is not a GraphRAG system built on top of an existing LLM. This is a standalone Native Graph Cognitive Engine.The Core Philosophy:Zero-Black-Box (Total Explainability): Modern LLMs are black boxes; you never truly know why they chose a specific token. Our engine is a “glass brain.” Every logical leap and every generated sentence i
b8679
llama-bench: add -fitc and -fitt to arguments ( #21304 ) llama-bench: add -fitc and -fitt to arguments update README.md address review comments update compare-llama-bench.py macOS/iOS: macOS Apple Silicon (arm64) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windows x64 (Vulkan) Windows x64 (SYCL) Windows x64 (HIP) openEuler: openEuler x86 (310p) openEuler x86 (910b, ACL Graph) openEuler aarch64 (310p) openEuler aarch64 (910b, ACL Graph)

15 Datasets for Training and Evaluating AI Agents
Datasets for training and evaluating AI agents are the foundation of reliable agentic systems. Agents don’t magically work — they need structured data that teaches action-taking: tool calling, web interaction, and multi-step planning. Just as importantly, they need evaluation datasets that catch regressions before those failures hit production. This is where most teams struggle. A chat model can sound correct while failing at execution, like returning invalid JSON, calling the wrong API, clicking the wrong element, or generating code that doesn’t actually fix the issue. In agentic workflows, those small failures compound across steps, turning minor errors into broken pipelines. That’s why datasets for training and evaluating AI agents should be treated as infrastructure, not a one-time res

The Minds Shaping AI: Meet the Keynote Speakers at ODSC AI East 2026
If you want to understand where AI is actually going, not just what’s trending, you look at who’s building it, scaling it, and questioning its limits. That’s exactly what the ODSC AI East 2026 keynote speakers lineup delivers. This year’s speakers span the full spectrum of AI: from foundational theory and cutting-edge research to enterprise deployment, governance, and workforce transformation. These are the people defining how AI moves from hype to real-world impact. Here’s who you’ll hear from and why missing them would mean missing where AI is headed next. The ODSC AI East 2026 Keynote Speakers Matt Sigelman, President at Burning Glass Institute Matt Sigelman is one of the foremost experts on labor market dynamics and the future of work. As President of the Burning Glass Institute, he ha


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!