Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessThe League of Legends KeSPA cup will air globally on Disney+EngadgetThe Gardenlesswrong.comGeneralist’s new physical robotics AI brings “production-level” success rates - Ars TechnicaGoogle News - AI roboticsAI slop got better, so now maintainers have more workThe Register AI/MLThe Silicon Protocol: The Model Hosting Decision — When Azure OpenAI Isn’t Enough (And When It’s…Towards AIIntel — Deep DiveDEV CommunitySecrets Management for Laravel: .env, Encrypted Config, and DeploynixDEV CommunitySemgrep vs Veracode: SAST Comparison for 2026DEV CommunityClaude's Source Code Got Leaked Across The Whole InternetMatt Wolfe (YouTube)OpenAI alums have been quietly investing from a new, potentially $100M fundTechCrunch VentureVibeNVR v1.25.3 – Open-source, self-hosted NVR for IP camerasDEV CommunityEmDash: A Full-Stack TypeScript CMS Built on Astro + Cloudflare — Can It Replace WordPress?DEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessThe League of Legends KeSPA cup will air globally on Disney+EngadgetThe Gardenlesswrong.comGeneralist’s new physical robotics AI brings “production-level” success rates - Ars TechnicaGoogle News - AI roboticsAI slop got better, so now maintainers have more workThe Register AI/MLThe Silicon Protocol: The Model Hosting Decision — When Azure OpenAI Isn’t Enough (And When It’s…Towards AIIntel — Deep DiveDEV CommunitySecrets Management for Laravel: .env, Encrypted Config, and DeploynixDEV CommunitySemgrep vs Veracode: SAST Comparison for 2026DEV CommunityClaude's Source Code Got Leaked Across The Whole InternetMatt Wolfe (YouTube)OpenAI alums have been quietly investing from a new, potentially $100M fundTechCrunch VentureVibeNVR v1.25.3 – Open-source, self-hosted NVR for IP camerasDEV CommunityEmDash: A Full-Stack TypeScript CMS Built on Astro + Cloudflare — Can It Replace WordPress?DEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

Gemma4 26B A4B runs easily on 16GB Macs

Reddit r/LocalLLaMAby /u/FenderMoon https://www.reddit.com/user/FenderMoonApril 4, 20262 min read1 views
Source Quiz

Typically, models in the 26B-class range are difficult to run on 16GB macs because any GPU acceleration requires the accelerated layers to sit entirely within wired memory. It's possible with aggressive quants (2 bits, or maybe a very lightweight IQ3_XXS), but quality degrades significantly by doing so. However, if run entirely on the CPU instead (which is much more feasible with MoE models), it's possible to run really good quants even when the models end up being larger than the entire available system RAM. There is some performance loss from swapping in and out experts, but I find that the performance loss is much less than I would have expected. I was able to easily achieve 6-10 tps with a context window of 8-16K on my M2 Macbook Pro (tested using IQ4_NL and Q5_K_S). Far from fast, but

Could not retrieve the full article text.

Read on Reddit r/LocalLLaMA →
Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelavailablereasoning

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Gemma4 26B …modelavailablereasoningpublishedquantizationReddit r/Lo…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 208 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!