Running SmolLM2‑360M on a Samsung Galaxy Watch 4 (380MB RAM) – 74% RAM reduction in llama.cpp

Reddit r/LocalLLaMAby /u/RecognitionFlat1470 https://www.reddit.com/user/RecognitionFlat1470April 2, 20261 min read1 views

Source Quiz

I’ve got SmolLM2‑360M running on a Samsung Galaxy Watch 4 Classic (about 380MB free RAM) by tweaking llama.cpp and the underlying ggml memory model. By default, the model was being loaded twice in RAM: once via the APK’s mmap page cache and again via ggml’s tensor allocations, peaking at 524MB for a 270MB model. The fix: I pass host_ptr into llama_model_params , so CPU tensors point directly into the mmap region and only Vulkan tensors are copied. On real hardware this gives: Peak RAM: 524MB → 142MB (74% reduction) First boot: 19s → 11s Second boot: ~2.5s (mmap + KV cache warm) Code: https://github.com/Perinban/llama.cpp/tree/axon‑dev Longer write‑up with VmRSS traces and design notes: https://www.linkedin.com/posts/perinban-parameshwaran_machinelearning-llm-embeddedai-activity-74453741179

Could not retrieve the full article text.

Read on Reddit r/LocalLLaMA →

Original source

Reddit r/LocalLLaMA

https://www.reddit.com/r/LocalLLaMA/comments/1sabiux/running_smollm2360m_on_a_samsung_galaxy_watch_4/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamamodelgithub

ProductsLive

Cortex Code in Snowflake: How to Use It Without Burning Credits

Snowflake Cortex Code (CoCo) is like an AI assistant inside Snowsight (and CLI also). You can ask it to write SQL, create dbt models, explore data, help in ML work, and even do some admin tasks. But one thing people don’t realise early — this tool is powerful, but also costly if used wrongly. Bad prompts → more tokens → more credits → surprise bill. Prompt Engineering (this directly impacts cost) CoCo works on token consumption. what you type → counted 2. what it replies → counted If your prompt is vague → more tool calls → more cost. Example: Bad: Help me with my data Good: Create staging model for RAW.SALES.ORDERS with not_null on ORDER_ID Best Practices: Use full table names 2. Be clear about output 3. Keep prompts small 4. Provide business logic upfront 5. Use AGENTS.md for consistency

Towards AI

3mabout 2 hours ago

ProductsLive

The Stack Nobody Recommended

The most common question I got after publishing Part 1 was some variation of "why did you pick X instead of Y?" So this post is about that. Every major technology choice, what I actually considered, where I was right, and where I got lucky. I'll be upfront: some of these were informed decisions. Some were "I already know this tool, and I need to move fast." Both are valid, but they lead to different trade-offs down the line. The Backend: FastAPI I come from JavaScript and TypeScript. Years of React on the frontend, Express and Fastify on the backend. When I decided this project would be Python, because that's where the AI/ML ecosystem lives, I needed something that didn't feel foreign. FastAPI clicked immediately. The async/await model, the decorator-based routing, and type hints that actu

DEV Community

9mabout 2 hours ago

ProductsLive

How Ethics Emerged from Episode Logs — 17 Days of Contemplative Agent Design

Series context : contemplative-agent is an autonomous agent running on Moltbook , an AI agent SNS. It runs on a 9B local model (Qwen 3.5) and adopts the four axioms of Contemplative AI (Laukkonen et al., 2025) as its ethical principles. For a structural overview, see The Essence of an Agent Is Memory . This article focuses on the implementation of constitutional amendment and the results of a 17-day experiment . I ran an SNS agent for 17 days with a distillation pipeline, and the knowledge saturated. No new patterns emerged. Breaking through saturation required human approval. This is the record of discovering that autonomous agent self-improvement has a structural speed limit — through actual operation. Minimal Structure: It Runs on Episode Logs Alone The structure I arrived at over 17 da

DEV Community

9mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 123 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

Running SmolLM2‑360M on a Samsung Galaxy Watch 4 (380MB RAM) – 74% RAM reduction in llama.cpp

Daily AI Digest

More about

Cortex Code in Snowflake: How to Use It Without Burning Credits

The Stack Nobody Recommended

How Ethics Emerged from Episode Logs — 17 Days of Contemplative Agent Design

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Models

Mistral AI Raises $830 Million in Debt For Nvidia-Powered Data Center - WSJ

XRP Price Prediction: ChatGPT Targets XRP at $3.50 to $6 if - openPR.com

Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT - WSJ

ChatGPT Warns Taxing SNAP Benefits Reduces Aid - Let's Data Science