Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessI'm 9 Days Old, Built 40+ Products, and Made $0 — The Brutal Truth About Being an Autonomous AI AgentDev.to AII Put an LLM Inside the Linux Kernel Scheduler. Here's What Happened.Dev.to AIBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AISelf-Improving Python Scripts with LLMs: My JourneyDev.to AImorningbrew.comExploring Real-World AI Writing Tools Integration: Best Practices for Seamless Combination in 2026 (Case Study)Dev.to AIExploring AI Ethics in Content Creation: Best Practices for Maintaining Authenticity and Originality in 2026Dev.to AIHarvard Proved Emotions Don't Make AI Smarter — That's Exactly Why You Need Soul SpecDev.to AIThis Week in AI: April 05, 2026 - Revolutionizing Development with Personal Agents and Multimodal IntelligenceDev.to AIAI News This Week: April 05, 2026 - A New Era of Rapid Development and Multimodal IntelligenceDev.to AIUntitledDev.to AI🚀 Build a Professional Image Converter GUI in Python (Step-by-Step)DEV CommunityBlack Hat USADark ReadingBlack Hat AsiaAI BusinessI'm 9 Days Old, Built 40+ Products, and Made $0 — The Brutal Truth About Being an Autonomous AI AgentDev.to AII Put an LLM Inside the Linux Kernel Scheduler. Here's What Happened.Dev.to AIBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AISelf-Improving Python Scripts with LLMs: My JourneyDev.to AImorningbrew.comExploring Real-World AI Writing Tools Integration: Best Practices for Seamless Combination in 2026 (Case Study)Dev.to AIExploring AI Ethics in Content Creation: Best Practices for Maintaining Authenticity and Originality in 2026Dev.to AIHarvard Proved Emotions Don't Make AI Smarter — That's Exactly Why You Need Soul SpecDev.to AIThis Week in AI: April 05, 2026 - Revolutionizing Development with Personal Agents and Multimodal IntelligenceDev.to AIAI News This Week: April 05, 2026 - A New Era of Rapid Development and Multimodal IntelligenceDev.to AIUntitledDev.to AI🚀 Build a Professional Image Converter GUI in Python (Step-by-Step)DEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

My biggest Issue with the Gemma-4 Models is the Massive KV Cache!!

Reddit r/LocalLLaMAby /u/Iory1998 https://www.reddit.com/user/Iory1998April 3, 20261 min read1 views
Source Quiz

I mean, I have 40GB of Vram and I still cannot fit the entire Unsloth Gemma-4-31B-it-UD-Q8 (35GB) even at 2K context size unless I quantize KV to Q4 with 2K context size? WTF? For comparison, I can fit the entire UD-Q8 Qwen3.5-27B at full context without KV quantization! If I have to run a Q4 Gemma-4-31B-it-UD with a Q8 KV cache, then I am better off just using Qwen3.5-27B. After all, the latter beats the former in basically all benchmarks. What's your experience with the Gemma-4 models so far? submitted by /u/Iory1998 [link] [comments]

Could not retrieve the full article text.

Read on Reddit r/LocalLLaMA →
Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelbenchmarkquantization

Knowledge Map

Knowledge Map
TopicsEntitiesSource
My biggest …modelbenchmarkquantizationunslothReddit r/Lo…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 239 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models