Gemma4 26B A4B runs easily on 16GB Macs
Typically, models in the 26B-class range are difficult to run on 16GB macs because any GPU acceleration requires the accelerated layers to sit entirely within wired memory. It's possible with aggressive quants (2 bits, or maybe a very lightweight IQ3_XXS), but quality degrades significantly by doing so. However, if run entirely on the CPU instead (which is much more feasible with MoE models), it's possible to run really good quants even when the models end up being larger than the entire available system RAM. There is some performance loss from swapping in and out experts, but I find that the performance loss is much less than I would have expected. I was able to easily achieve 6-10 tps with a context window of 8-16K on my M2 Macbook Pro (tested using IQ4_NL and Q5_K_S). Far from fast, but
Could not retrieve the full article text.
Read on Reddit r/LocalLLaMA →Reddit r/LocalLLaMA
https://www.reddit.com/r/LocalLLaMA/comments/1scjoox/gemma4_26b_a4b_runs_easily_on_16gb_macs/Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelavailablereasoning
New Jersey has no right to ban Kalshi's prediction market, US appeals court rules
Kalshi can't be stopped in New Jersey. A 3rd US Circuit Court of Appeals panel ruled on Monday that New Jersey has no authority to regulate Kalshi's prediction market allowing people to bet on the outcome of sports events. That power rests with the Commodity Futures Trading Commission, the panel ruled 2-1. The CFTC is headed by President Donald Trump appointee Michael Selig, who vocally and actively supports prediction markets like Kalshi and Polymarket, calling them "exciting products." The Trump family agrees: Donald Trump Jr. is a paid adviser to Kalshi and an unpaid adviser to Polymarket, and Truth Social, which is run by the Trump Media and Technology Group, is set to start a prediction market of its own. Online prediction markets are an emerging phenomenon that allow users to bet on

AI slop got better, so now maintainers have more work
Once AI bug reports become plausible, someone still has to verify them If AI does more of the work but humans still have to check it, you need more reviewers. Now that AI models have gotten better at writing and evaluating code, open-source projects find themselves overwhelmed with the too-good-to-ignore output.…

The Silicon Protocol: The Model Hosting Decision — When Azure OpenAI Isn’t Enough (And When It’s…
The Silicon Protocol: The Model Hosting Decision — When Azure OpenAI Isn’t Enough (And When It’s Overkill) The $187K infrastructure decision every healthcare CTO makes is wrong. Here’s the actual math on self-hosted vs. API vs. hybrid LLM deployment. The three hosting patterns healthcare organizations choose — and what each actually costs at scale. Most start left (API), graduate to center (Hybrid), few need right (Self-hosted). The compliance officer approved your de-identification pipeline. Legal signed off on the BAA. Security validated your OAuth tokens are governed by proper Passports. You’re ready to deploy your first production LLM in healthcare. Then engineering asks the question that determines whether you spend $50K or $250K this year: “Where does the model actually run?” Most he
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

AMD's AI director slams Claude Code for becoming dumber and lazier since last update
'Claude cannot be trusted to perform complex engineering tasks' according to GitHub ticket If you've noticed Claude Code's performance degrading to the point where you find you don't trust it to handle complicated tasks anymore, you're not alone.…


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!