Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessArtificial intelligence hype cycle risks collapse with market implications - MSNGoogle News: AIMeta paused its work with AI training startup Mercor after a data breachBusiness Insider[R], 31 MILLIONS High frequency data, Light GBM worked perfectlyReddit r/MachineLearningConsidering NeurIPS submission [D]Reddit r/MachineLearningAutomate Your Handyman Pricing: The True Hourly Cost AI ForgetsDev.to AIScience Is Not a Reading ProblemMedium AIHow Antigravity AI Changed My React Workflow (In Ways I Didn’t Expect)Medium AIToken Usage Is the New RAM UsageDev.to AIStop Writing Rules for AI AgentsDev.to AIUsing AI as your therapist?Medium AIDigital Marketing Trends and the Role of AI in Modern Business StrategiesMedium AIThe AI Pen: Collaborating With Artificial Intelligence Without Losing Your Unique VoiceMedium AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessArtificial intelligence hype cycle risks collapse with market implications - MSNGoogle News: AIMeta paused its work with AI training startup Mercor after a data breachBusiness Insider[R], 31 MILLIONS High frequency data, Light GBM worked perfectlyReddit r/MachineLearningConsidering NeurIPS submission [D]Reddit r/MachineLearningAutomate Your Handyman Pricing: The True Hourly Cost AI ForgetsDev.to AIScience Is Not a Reading ProblemMedium AIHow Antigravity AI Changed My React Workflow (In Ways I Didn’t Expect)Medium AIToken Usage Is the New RAM UsageDev.to AIStop Writing Rules for AI AgentsDev.to AIUsing AI as your therapist?Medium AIDigital Marketing Trends and the Role of AI in Modern Business StrategiesMedium AIThe AI Pen: Collaborating With Artificial Intelligence Without Losing Your Unique VoiceMedium AI
AI NEWS HUBbyEIGENVECTOREigenvector

VRAM optimization for gemma 4

Reddit r/LocalLLaMAby /u/Sadman782 https://www.reddit.com/user/Sadman782April 3, 20262 min read2 views
Source Quiz
🧒Explain Like I'm 5Simple language

Hey little friend! Imagine your computer is like a big toy box, and inside, there's a super-smart robot brain named Gemma! 🤖

Sometimes, Gemma needs a special crayon box called "VRAM" to draw its thoughts. But Gemma was being a bit greedy and taking up too many crayons, even before it started drawing! This made the crayon box full too fast. 🖍️💨

But guess what? Smart grown-ups found a magic trick! They added a secret word, -np 1, to Gemma's instructions. It's like telling Gemma, "Hey, only you are drawing right now, so don't grab ALL the crayons!" ✨

Now, Gemma uses much fewer crayons, and there's more space in the crayon box for all its amazing drawings! Hooray! 🎉

TLDR: add -np 1 to your llama.cpp launch command if you are the only user, cuts SWA cache VRAM by 3x instantly So I was messing around with Gemma 4 and noticed the dense model hogs a massive chunk of VRAM before you even start generating anything. If you are on 16GB you might be hitting OOM and wondering why. The culprit is the SWA (Sliding Window Attention) KV cache. It allocates in F16 and does not get quantized like the rest of the KV cache. A couple days ago ggerganov merged a PR that accidentally made this worse by keeping the SWA portion unquantized even when you have KV cache quantization enabled. It got reverted about 2 hours later here https://github.com/ggml-org/llama.cpp/pull/21332 so make sure you are on a recent build. A few things that actually help with VRAM: The SWA cache s

Could not retrieve the full article text.

Read on Reddit r/LocalLLaMA →
Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
VRAM optimi…llamamodellaunchgithubllama.cppquantizationReddit r/Lo…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 187 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Open Source AI