Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessNORD Drivesystems launches industrial gear units for the mining industryThe Robot ReportHow the Amazon Echo learned to talk — and listenThe Verge AIHere's when poker tactics secured Microsoft’s DeepMind deal - The News InternationalGoogle News: DeepMind🔥 sponsors/atilaahmettanerGitHub Trending🔥 HKUDS/RAG-AnythingGitHub Trending🔥 google-ai-edge/galleryGitHub Trending🔥 google-ai-edge/LiteRT-LMGitHub Trending🔥 sponsors/badlogicGitHub Trending🔥 google-deepmind/gemmaGitHub TrendingAI DEFENSE KEYNOTE SPEAKER: MILITARY ARTIFICIAL INTELLIGENCE FUTURIST FOR EVENTS - futuristsspeakers.comGNews AI militaryEverything Works, But Users Are Still Confused: What SaaS Teams Are MissingDEV CommunityARTIFICIAL INTELLIGENCE KEYNOTE SPEAKER FOR CORPORATE EVENTS & AI CONFERENCES - futuristsspeakers.comGoogle News: AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessNORD Drivesystems launches industrial gear units for the mining industryThe Robot ReportHow the Amazon Echo learned to talk — and listenThe Verge AIHere's when poker tactics secured Microsoft’s DeepMind deal - The News InternationalGoogle News: DeepMind🔥 sponsors/atilaahmettanerGitHub Trending🔥 HKUDS/RAG-AnythingGitHub Trending🔥 google-ai-edge/galleryGitHub Trending🔥 google-ai-edge/LiteRT-LMGitHub Trending🔥 sponsors/badlogicGitHub Trending🔥 google-deepmind/gemmaGitHub TrendingAI DEFENSE KEYNOTE SPEAKER: MILITARY ARTIFICIAL INTELLIGENCE FUTURIST FOR EVENTS - futuristsspeakers.comGNews AI militaryEverything Works, But Users Are Still Confused: What SaaS Teams Are MissingDEV CommunityARTIFICIAL INTELLIGENCE KEYNOTE SPEAKER FOR CORPORATE EVENTS & AI CONFERENCES - futuristsspeakers.comGoogle News: AI
AI NEWS HUBbyEIGENVECTOREigenvector

Running SmolLM2‑360M on a Samsung Galaxy Watch 4 (380MB RAM) – 74% RAM reduction in llama.cpp

Reddit r/LocalLLaMAby /u/RecognitionFlat1470 https://www.reddit.com/user/RecognitionFlat1470April 2, 20261 min read1 views
Source Quiz

I’ve got SmolLM2‑360M running on a Samsung Galaxy Watch 4 Classic (about 380MB free RAM) by tweaking llama.cpp and the underlying ggml memory model. By default, the model was being loaded twice in RAM: once via the APK’s mmap page cache and again via ggml’s tensor allocations, peaking at 524MB for a 270MB model. The fix: I pass host_ptr into llama_model_params , so CPU tensors point directly into the mmap region and only Vulkan tensors are copied. On real hardware this gives: Peak RAM: 524MB → 142MB (74% reduction) First boot: 19s → 11s Second boot: ~2.5s (mmap + KV cache warm) Code: https://github.com/Perinban/llama.cpp/tree/axon‑dev Longer write‑up with VmRSS traces and design notes: https://www.linkedin.com/posts/perinban-parameshwaran_machinelearning-llm-embeddedai-activity-74453741179

Could not retrieve the full article text.

Read on Reddit r/LocalLLaMA →
Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamamodelgithub

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Running Smo…llamamodelgithubllama.cppReddit r/Lo…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 123 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!