Gemma 4 vs Qwen 3.5 Benchmark Comparison
I took the official benchmarks for Qwen 3.5 and Gemma 4 and compiled them into a neck-and-neck comparison here. The Benchmark Table Benchmark Qwen 2B Gemma E2B Qwen 4B Gemma E4B Qwen 27B Gemma 31B Qwen 35B (MoE) Gemma 26B (MoE) MMLU-Pro 66.5% 60.0% 79.1% 69.4% 86.1% 85.2% 85.3% 82.6% GPQA Diamond 51.6% 43.4% 76.2% 58.6% 85.5% 84.3% 84.2% 82.3% LiveCodeBench v6 69.4% 44.0% 55.8% 52.0% 80.7% 80.0% 74.6% 77.1% Codeforces ELO N/A 633 24.1 940 1899 2150 2028 1718 TAU2-Bench 48.8% 24.5% 79.9% 42.2% 79.0% 76.9% 81.2% 68.2% MMMLU (Multilingual) 63.1% 60.0% 76.1% 69.4% 85.9% 85.2% 85.2% 86.3% HLE-n (No tools) N/A N/A N/A N/A 24.3% 19.5% 22.4% 8.7% HLE-t (With tools) N/A N/A N/A N/A 48.5% 26.5% 47.4% 17.2% AIME 2026 N/A N/A N/A 42.5% N/A 89.2% N/A 88.3% MMMU Pro (Vision) N/A N/A N/A N/A 75.0% 76.9%
Fetching article from Reddit r/LocalLLaMA…
Reddit r/LocalLLaMA
https://www.reddit.com/r/LocalLLaMA/comments/1sbp8ny/gemma_4_vs_qwen_35_benchmark_comparison/Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!