Live

•Black Hat USAAI Business •Black Hat AsiaAI Business •I Renamed All 43 Tools in My MCP Server. Here's Why I Did It Now.Dev.to AI •Why AI Pilots Fail — And the 5 Patterns That Actually Get to ProductionDev.to AI •Building Predictive Maintenance Systems for Infrastructure MonitoringDev.to AI •The Best Scribe Alternative in 2026 (Privacy-First, AI-Ready)Dev.to AI •I Started Building a Roguelike RPG — Powered by On-Device AI #2Dev.to AI •GR4AD: Kuaishou's Production-Ready Generative Recommender for Ads Delivers 4.2% Revenue LiftDev.to AI •FAOS Neurosymbolic Architecture Boosts Enterprise Agent Accuracy by 46% via Ontology-Constrained ReasoningDev.to AI •Own Your Data: The Wake-Up CallDev.to AI •How I Replaced 6 Paid AI Subscriptions With One Free Tool (Saved $86/Month)Dev.to AI •Claude Code subagent patterns: how to break big tasks into bounded scopesDev.to AI •Intercom Opens Fin to the World - The AI Economy | Ken YeungGNews AI RAG •Anthropic says Claude subscriptions will no longer cover usage on third-party tools like OpenClaw starting April 4 at 12pm PT, to better manage capacity (Boris Cherny/@bcherny)Techmeme •Black Hat USAAI Business •Black Hat AsiaAI Business •I Renamed All 43 Tools in My MCP Server. Here's Why I Did It Now.Dev.to AI •Why AI Pilots Fail — And the 5 Patterns That Actually Get to ProductionDev.to AI •Building Predictive Maintenance Systems for Infrastructure MonitoringDev.to AI •The Best Scribe Alternative in 2026 (Privacy-First, AI-Ready)Dev.to AI •I Started Building a Roguelike RPG — Powered by On-Device AI #2Dev.to AI •GR4AD: Kuaishou's Production-Ready Generative Recommender for Ads Delivers 4.2% Revenue LiftDev.to AI •FAOS Neurosymbolic Architecture Boosts Enterprise Agent Accuracy by 46% via Ontology-Constrained ReasoningDev.to AI •Own Your Data: The Wake-Up CallDev.to AI •How I Replaced 6 Paid AI Subscriptions With One Free Tool (Saved $86/Month)Dev.to AI •Claude Code subagent patterns: how to break big tasks into bounded scopesDev.to AI •Intercom Opens Fin to the World - The AI Economy | Ken YeungGNews AI RAG •Anthropic says Claude subscriptions will no longer cover usage on third-party tools like OpenClaw starting April 4 at 12pm PT, to better manage capacity (Boris Cherny/@bcherny)Techmeme

AI NEWS HUBbyEIGENVECTOR

Gemma 4 31B at 256K Full Context on a Single RTX 5090 — TurboQuant KV Cache Benchmark

Open Source AI llama model benchmark release feature global

Gemma 4 31B at 256K Full Context on a Single RTX 5090 — TurboQuant KV Cache Benchmark

Reddit r/LocalLLaMAby /u/PerceptionGrouchy187 https://www.reddit.com/user/PerceptionGrouchy187April 3, 20262 min read1 views

Just got Gemma 4 31B running at full 256K context on a single RTX 5090 using TurboQuant KV cache compression. System Specs Component Spec GPU NVIDIA GeForce RTX 5090 (32GB VRAM) CPU AMD Ryzen 9 9950X3D (16-core) RAM 64GB DDR5 OS Windows 11 Setup Model : gemma-4-31B-it-UD-Q4_K_XL from Unsloth (17.46 GiB) Build : TheTom/llama-cpp-turboquant branch feature/turboquant-kv-cache , merged with latest upstream master for Gemma 4 support KV Cache : turbo3 (3-bit PolarQuant + Hadamard rotation, ~4.5x compression vs f16) Config : --n-gpu-layers 99 --no-mmap --flash-attn on --cache-type-k turbo3 --cache-type-v turbo3 Benchmark Results Test Speed (t/s) pp4096 3,362.71 pp16384 3,047.00 pp65536 2,077.96 pp131072 1,428.80 pp262144 899.55 tg128 61.51 VRAM usage at 262K : 27.7 GB / 32 GB (4.3 GB headroom) G

Could not retrieve the full article text.

Read on Reddit r/LocalLLaMA →

Original source

Reddit r/LocalLLaMA

https://www.reddit.com/r/LocalLLaMA/comments/1sbdihw/gemma_4_31b_at_256k_full_context_on_a_single_rtx/

Was this article helpful?

Sign in to highlight and annotate this article

Ask AI about this article

Powered by Eigenvector · full article context loaded

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamamodelbenchmark

Dutch healthcare AI Juvoly acquired by Swedish Tandem Health - Techzine Global

Dutch healthcare AI Juvoly acquired by Swedish Tandem Health - Techzine Global

Dutch healthcare AI Juvoly acquired by Swedish Tandem Health Techzine Global

GNews AI Netherlands

Meta Breaks 8-Year Transformer Rule, Rewrites AI's Fundamental Rules, Model Shows Subconsciousness for First Time - 36 Kr

Meta Breaks 8-Year Transformer Rule, Rewrites AI's Fundamental Rules, Model Shows Subconsciousness for First Time - 36 Kr

Meta Breaks 8-Year Transformer Rule, Rewrites AI's Fundamental Rules, Model Shows Subconsciousness for First Time 36 Kr

GNews AI transformer

SeismoQuakeGNN: a hybrid framework for spatio-temporal earthquake prediction with transformer-enhanced models - Frontiers

SeismoQuakeGNN: a hybrid framework for spatio-temporal earthquake prediction with transformer-enhanced models - Frontiers

SeismoQuakeGNN: a hybrid framework for spatio-temporal earthquake prediction with transformer-enhanced models Frontiers

GNews AI transformer

Knowledge Map

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 128 connections

Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Open Source AI

Energy constraints loom larger than water for Colorado AI boom, experts say - Colorado Politics

Open Source AIRecent

Energy constraints loom larger than water for Colorado AI boom, experts say - Colorado Politics

Energy constraints loom larger than water for Colorado AI boom, experts say Colorado Politics

GNews AI energy

What AI is actually good for, according to developers - The GitHub Blog

What AI is actually good for, according to developers - The GitHub Blog

What AI is actually good for, according to developers The GitHub Blog

GNews AI coding

We Ditched LangChain. Here’s What We Built Instead — and Why It’s Better for Serious AI Research.

Open Source AILive

We Ditched LangChain. Here’s What We Built Instead — and Why It’s Better for Serious AI Research.

How two lean open-source frameworks outperform the incumbents when you need typed skill contracts, concurrent scientific tool execution… Continue reading on Medium »

1mabout 2 hours ago

Show HN: Filoxenia – open protocol for human-AI companionship

Open Source AIFresh

Show HN: Filoxenia – open protocol for human-AI companionship

Article URL: https://github.com/Filoxenia/filoxenia Comments URL: https://news.ycombinator.com/item?id=47632623 Points: 1 # Comments: 0

Hacker News AI Top

6mabout 3 hours ago