Gemma 4 running on Raspberry Pi5

Reddit r/LocalLLaMAby /u/jslominski https://www.reddit.com/user/jslominskiApril 2, 20261 min read0 views

To be specific: RP5 8GB with SSD (but the speed is the same on the non-ssd one), running Potato OS with latest llama.cpp branch compiled. This is Gemma 4 e2b, the Unsloth variety. submitted by /u/jslominski [link] [comments]

Could not retrieve the full article text.

Read on Reddit r/LocalLLaMA →

Original source

Reddit r/LocalLLaMA

https://www.reddit.com/r/LocalLLaMA/comments/1sarlb8/gemma_4_running_on_raspberry_pi5/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamallama.cppunsloth

Models

Getting Started with Llama 3.1 405B: Build Custom LLMs with Synthetic Data Generation and Distillation - Snowflake

Getting Started with Llama 3.1 405B: Build Custom LLMs with Synthetic Data Generation and Distillation Snowflake

GNews AI fine-tuning

1m5 months ago

ModelsFresh

Monarch v3: 78% Faster LLM Inference with NES-Inspired KV Paging

TL;DR: We implemented NES-inspired memory paging for transformers. On a 1.1B parameter model, inference is now 78% faster (17.01 → 30.42 tok/sec) with nearly zero VRAM overhead. The algorithm is open source, fully benchmarked, and ready to use. The Problem KV cache grows linearly with sequence length. By 4K tokens, most of it sits unused—recent tokens matter far more than old ones, yet we keep everything in VRAM at full precision. Standard approaches (quantization, pruning, distillation) are invasive. We wanted something simpler: just move the old stuff out of the way. The Solution: NES-Inspired Paging Think of it like a Game Boy's memory banking system. The cache is split into a hot region (recent tokens, full precision) and a cold region (older tokens, compressed). As new tokens arrive,

Reddit r/LocalLLaMA

4mabout 3 hours ago

Open Source AIFresh

Quantizers appriciation post

Hey everyone, Yesterday I decided to try and learn how to quantize ggufs myself with reasonable quality, in order to understand the magic behind the curtain. Holy... I did not expect how much work it is, how long it takes, and requires A LOT (500GB!) of storage space for just Gemma-4-26B-A4B in various sizes. There really is an art to configuring them too, with variations between architectures and quant types. Thanks to unsloth releasing their imatrix file and huggingface showing the weight types inside their viewer, I managed to cobble something together without LLM assistance. I ran into a few hiccups and some of the information is a bit confusing, so I documented my process in the hopes of making it easier for someone else to learn and experiment. My recipe and full setup guide can be f

Reddit r/LocalLLaMA

2mabout 3 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 120 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Open Source AI

Open Source AIFresh

Quantizers appriciation post

Reddit r/LocalLLaMA

2mabout 3 hours ago

Open Source AILive

Swift package AI inference engine generated from Rust crate

Article URL: https://github.com/ondeinference/onde-swift Comments URL: https://news.ycombinator.com/item?id=47636762 Points: 2 # Comments: 0

Hacker News AI Top

2mabout 1 hour ago

Open Source AIFresh

trunk/6c6e22937db24fe8c7b74452a6d3630c65d1c8b8: Revert "Remove TRITON=yes from CPU-only GCC11 docker configs (#179314)"

This reverts commit 670be7c . Reverted #179314 on behalf of https://github.com/izaitsevfb due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ( comment )

PyTorch Releases

1mabout 5 hours ago

Open Source AIFresh

Run Linux containers on Android, no root required

Article URL: https://github.com/ExTV/Podroid Comments URL: https://news.ycombinator.com/item?id=47633131 Points: 53 # Comments: 15

Hacker News Top

4mabout 10 hours ago