Lemonade by AMD: a fast and open source local LLM server using GPU and NPU

Hacker NewsApril 2, 20262 min read1 views

Source Quiz

Comments

Refreshingly fast

on GPUs and NPUs

Open source. Private. Ready in minutes on any PC.

Chat

What can I do with 128 GB of unified RAM?

Load up models like gpt-oss-120b or Qwen-Coder-Next for advanced tool use.

What should I tune first?

You can use --no-mmap to speed up load times and increase context size to 64 or more.

Image Generation

A pitcher of lemonade in the style of a renaissance painting

Speech

Hello, I am your AI assistant. What can I do for you today?

Open Source

Built by the local AI community for every PC.

Lemonade exists because local AI should be free, open, fast, and private.

Ecosystem

Works with great apps.

Lemonade is integrated in many apps and works out-of-box with hundreds more thanks to the OpenAI API standard.

Tech Specs

Built for practical local AI workflows.

Everything from install to runtime is optimized for fast setup, broad compatibility, and local-first execution.

Native C++ Backend

Lightweight service that is only 2MB.

One Minute Install

Simple installer that sets up the stack automatically.

OpenAI API Compatible

Works with hundreds of apps out-of-box and integrates in minutes.

Auto-configures for your hardware

Configures dependencies for your GPU and NPU.

Multi-engine compatibility

Works with llama.cpp, Ryzen AI SW, FastFlowLM, and more.

Multiple Models at Once

Run more than one model at the same time.

Cross-platform

A consistent experience across Windows, Linux, and macOS (beta).

Built-in app

A GUI that lets you download, try, and switch models quickly.

Unified API

One local service for every modality.

Point your app at Lemonade and get chat, vision, image gen, transcription, speech gen, and more with standard APIs.

POST /api/v1/chat/completions

Latest Release

Always improving.

Track the newest improvements and highlights from the Lemonade release stream.

Original source

Hacker News

https://lemonade-server.ai

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

open source

ProductsLive

The AI Stack: A Practical Guide to Building Your Own Intelligent Applications

Beyond the Hype: What Does "Building with AI" Actually Mean? Another week, another wave of AI headlines. From speculative leaks to existential debates, the conversation often orbits the sensational. But for developers, the real story is happening in the trenches: the practical, stack-by-stack integration of intelligence into real applications. While the industry debates "how it happened," we're busy figuring out how to use it . Forget the monolithic "AI" label for a moment. Modern AI application development is less about creating a sentient being and more about strategically assembling a set of powerful, specialized tools. It's about choosing the right component for the job—be it generating text, analyzing images, or making predictions—and wiring it into your existing systems. This guide b

DEV Community

7m36 minutes ago

ReleasesFresh

North Korea s hijack of one of the web s most used open source projects was likely weeks in the making

North Korean hackers pushed out malicious updates to a popular open source project by hacking a top developer's computer in a long-running campaign.

TechCrunch AI

1mabout 3 hours ago

ReleasesLive

How I Discovered the Hidden Cost of "Lightweight" Python Packages

The "It's Just a Small Library" Trap We've all been there. You find a Python package that promises to solve your problem with minimal overhead. The README says "lightweight," the GitHub stars look good, and the developer swears it's "just a few kilobytes." So you install it, run your project, and wonder why your Docker image grew by 200MB. What happened? The package is small. But its dependencies aren't. And those dependencies have dependencies. And those... you get the idea. The Moment I Realized Something Was Missing I was comparing HTTP libraries for a new project. requests is popular, but everyone says it's "heavy." Then I found a library that claimed to be a "lightweight alternative." But something in my gut said "let me check." So I built pip-size — a tool that calculates the real do

DEV Community

3mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 211 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsLive

Semantic matching in graph space without matrix computation and hallucinations and no GPU

Hello AI community,For the past few months, I’ve been rethinking how AI should process language and logic. Instead of relying on heavy matrix multiplications (Attention mechanisms) to statistically guess the next word inside an unexplainable black box, I asked a different question: What if concepts existed in a physical, multi-dimensional graph space where logic is visually traceable?I am excited to share our experimental architecture. To be absolutely clear: this is not a GraphRAG system built on top of an existing LLM. This is a standalone Native Graph Cognitive Engine.The Core Philosophy:Zero-Black-Box (Total Explainability): Modern LLMs are black boxes; you never truly know why they chose a specific token. Our engine is a “glass brain.” Every logical leap and every generated sentence i

discuss.huggingface.co

2m34 minutes ago

ModelsLive

b8679

llama-bench: add -fitc and -fitt to arguments ( #21304 ) llama-bench: add -fitc and -fitt to arguments update README.md address review comments update compare-llama-bench.py macOS/iOS: macOS Apple Silicon (arm64) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windows x64 (Vulkan) Windows x64 (SYCL) Windows x64 (HIP) openEuler: openEuler x86 (310p) openEuler x86 (910b, ACL Graph) openEuler aarch64 (310p) openEuler aarch64 (910b, ACL Graph)

llama.cpp Releases

1mabout 1 hour ago

ModelsFresh

15 Datasets for Training and Evaluating AI Agents

Datasets for training and evaluating AI agents are the foundation of reliable agentic systems. Agents don’t magically work — they need structured data that teaches action-taking: tool calling, web interaction, and multi-step planning. Just as importantly, they need evaluation datasets that catch regressions before those failures hit production. This is where most teams struggle. A chat model can sound correct while failing at execution, like returning invalid JSON, calling the wrong API, clicking the wrong element, or generating code that doesn’t actually fix the issue. In agentic workflows, those small failures compound across steps, turning minor errors into broken pipelines. That’s why datasets for training and evaluating AI agents should be treated as infrastructure, not a one-time res

ODSC Medium

5mabout 5 hours ago

ModelsLive

The Minds Shaping AI: Meet the Keynote Speakers at ODSC AI East 2026

If you want to understand where AI is actually going, not just what’s trending, you look at who’s building it, scaling it, and questioning its limits. That’s exactly what the ODSC AI East 2026 keynote speakers lineup delivers. This year’s speakers span the full spectrum of AI: from foundational theory and cutting-edge research to enterprise deployment, governance, and workforce transformation. These are the people defining how AI moves from hype to real-world impact. Here’s who you’ll hear from and why missing them would mean missing where AI is headed next. The ODSC AI East 2026 Keynote Speakers Matt Sigelman, President at Burning Glass Institute Matt Sigelman is one of the foremost experts on labor market dynamics and the future of work. As President of the Burning Glass Institute, he ha

ODSC Medium

6mabout 1 hour ago