Models model available reasoning published quantization

Gemma4 26B A4B runs easily on 16GB Macs

Reddit r/LocalLLaMAby /u/FenderMoon https://www.reddit.com/user/FenderMoonApril 4, 20262 min read1 views

Typically, models in the 26B-class range are difficult to run on 16GB macs because any GPU acceleration requires the accelerated layers to sit entirely within wired memory. It's possible with aggressive quants (2 bits, or maybe a very lightweight IQ3_XXS), but quality degrades significantly by doing so. However, if run entirely on the CPU instead (which is much more feasible with MoE models), it's possible to run really good quants even when the models end up being larger than the entire available system RAM. There is some performance loss from swapping in and out experts, but I find that the performance loss is much less than I would have expected. I was able to easily achieve 6-10 tps with a context window of 8-16K on my M2 Macbook Pro (tested using IQ4_NL and Q5_K_S). Far from fast, but

Could not retrieve the full article text.

Read on Reddit r/LocalLLaMA →

Original source

Reddit r/LocalLLaMA

https://www.reddit.com/r/LocalLLaMA/comments/1scjoox/gemma4_26b_a4b_runs_easily_on_16gb_macs/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelavailablereasoning

ProductsLive

New Jersey has no right to ban Kalshi's prediction market, US appeals court rules

Kalshi can't be stopped in New Jersey. A 3rd US Circuit Court of Appeals panel ruled on Monday that New Jersey has no authority to regulate Kalshi's prediction market allowing people to bet on the outcome of sports events. That power rests with the Commodity Futures Trading Commission, the panel ruled 2-1. The CFTC is headed by President Donald Trump appointee Michael Selig, who vocally and actively supports prediction markets like Kalshi and Polymarket, calling them "exciting products." The Trump family agrees: Donald Trump Jr. is a paid adviser to Kalshi and an unpaid adviser to Polymarket, and Truth Social, which is run by the Trump Media and Technology Group, is set to start a prediction market of its own. Online prediction markets are an emerging phenomenon that allow users to bet on

Engadget

3mabout 1 hour ago

Analyst NewsLive

AI slop got better, so now maintainers have more work

Once AI bug reports become plausible, someone still has to verify them If AI does more of the work but humans still have to check it, you need more reviewers. Now that AI models have gotten better at writing and evaluating code, open-source projects find themselves overwhelmed with the too-good-to-ignore output.…

The Register AI/ML

1m40 minutes ago

ProductsLive

The Silicon Protocol: The Model Hosting Decision — When Azure OpenAI Isn’t Enough (And When It’s…

The Silicon Protocol: The Model Hosting Decision — When Azure OpenAI Isn’t Enough (And When It’s Overkill) The $187K infrastructure decision every healthcare CTO makes is wrong. Here’s the actual math on self-hosted vs. API vs. hybrid LLM deployment. The three hosting patterns healthcare organizations choose — and what each actually costs at scale. Most start left (API), graduate to center (Hybrid), few need right (Self-hosted). The compliance officer approved your de-identification pipeline. Legal signed off on the BAA. Security validated your OAuth tokens are governed by proper Passports. You’re ready to deploy your first production LLM in healthcare. Then engineering asks the question that determines whether you spend $50K or $250K this year: “Where does the model actually run?” Most he

Towards AI

24mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 208 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsFresh

AMD's AI director slams Claude Code for becoming dumber and lazier since last update

'Claude cannot be trusted to perform complex engineering tasks' according to GitHub ticket If you've noticed Claude Code's performance degrading to the point where you find you don't trust it to handle complicated tasks anymore, you're not alone.…