Open Source AI llama model benchmark release open source stock

APEX MoE quantized models boost with 33% faster inference and TurboQuant (14% of speedup in prompt processing)

Reddit r/LocalLLaMAby /u/mudler_it https://www.reddit.com/user/mudler_itApril 1, 20261 min read0 views

I've just released APEX (Adaptive Precision for EXpert Models): a novel MoE quantization technique that outperforms Unsloth Dynamic 2.0 on accuracy while being 2x smaller for MoE architectures. Benchmarked on Qwen3.5-35B-A3B, but the method applies to any MoE model. Half the size of Q8. Perplexity comparable to F16. Works with stock llama.cpp with no patches. Open source (of course!), with github.com/mudler/LocalAI team! https://preview.redd.it/uv2bnfheymsg1.jpg?width=1632 format=pjpg auto=webp s=3eca979e8f9ca6b75d206eecdf29308b74aed530 Perplexity by itself doesn't say the full story. KL divergence tells a story perplexity doesn't: https://preview.redd.it/jn9ua2ksymsg1.jpg?width=1617 format=pjpg auto=webp s=7df969308e10aa6b6d31098c92fca1c14bb42a40 Tiers for every GPU: - I-Quality: 21.3 GB

Could not retrieve the full article text.

Read on Reddit r/LocalLLaMA →

Original source

Reddit r/LocalLLaMA

https://www.reddit.com/r/LocalLLaMA/comments/1s9vzry/apex_moe_quantized_models_boost_with_33_faster/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamamodelbenchmark

Market News

Baidu Stock Surges as Chip Unit Files for IPO. Chinese AI Thrives in Nvidia’s Absence. - Barron's

Baidu Stock Surges as Chip Unit Files for IPO. Chinese AI Thrives in Nvidia’s Absence. Barron's

GNews AI Baidu

1m3 months ago

ModelsFresh

Microsoft launches in-house AI models MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2, built by its superintelligence team, as it pursues "AI self-sufficiency" (Michael Nuñez/VentureBeat)

Michael Nu ez / VentureBeat : Microsoft launches in-house AI models MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2, built by its superintelligence team, as it pursues AI self-sufficiency Microsoft on Wednesday launched three new foundational AI models it built entirely in-house a state

Techmeme

1mabout 3 hours ago

ModelsLive

LLMOps in 2026: The 10 Tools Every Team Must Have

Don’t deploy another model until you check out these essential 2026 LLMOps tools.

KDnuggets

1mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 129 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Open Source AI

Open Source AIFresh

Any Pantheon (TV Show) fans here?

Would you like to chat with a UI? https://huggingface.co/spaces/shreyask/pantheon-ui Fine-tuned LiquidAI’s LFM2.5-1.2B-Thinking running 100% in-browser via WebGPU + HuggingFace Transformers.js. submitted by /u/immi_song [link] [comments]

Reddit r/LocalLLaMA

1mabout 6 hours ago

Open Source AILive

v0.20.0-rc0: Merge pull request #42 from ollama/jmorganca/gemma4-ggml-improvements

gemma4: fix MoE fused gate_up split and multiline tool-call arg parsing

Ollama Releases

1mabout 1 hour ago

Open Source AI

Tencent Cloud Integrates Hunyuan 3D with ComfyUI for 3D AI Generation - Asia Business Outlook

Tencent Cloud Integrates Hunyuan 3D with ComfyUI for 3D AI Generation Asia Business Outlook

Google News - Tencent AI

1m3 days ago

Open Source AIFresh

AI Slop Detector

Article URL: https://github.com/QCK-Framework/QCK_SMART_Fractal-Data-Pruning Comments URL: https://news.ycombinator.com/item?id=47612917 Points: 1 # Comments: 0

Hacker News AI Top

1mabout 4 hours ago