Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessArtificial intelligence hype cycle risks collapse with market implications - MSNGoogle News: AIMeta paused its work with AI training startup Mercor after a data breachBusiness Insider[R], 31 MILLIONS High frequency data, Light GBM worked perfectlyReddit r/MachineLearningConsidering NeurIPS submission [D]Reddit r/MachineLearningAutomate Your Handyman Pricing: The True Hourly Cost AI ForgetsDev.to AIScience Is Not a Reading ProblemMedium AIHow Antigravity AI Changed My React Workflow (In Ways I Didn’t Expect)Medium AIToken Usage Is the New RAM UsageDev.to AIStop Writing Rules for AI AgentsDev.to AIUsing AI as your therapist?Medium AIDigital Marketing Trends and the Role of AI in Modern Business StrategiesMedium AIThe AI Pen: Collaborating With Artificial Intelligence Without Losing Your Unique VoiceMedium AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessArtificial intelligence hype cycle risks collapse with market implications - MSNGoogle News: AIMeta paused its work with AI training startup Mercor after a data breachBusiness Insider[R], 31 MILLIONS High frequency data, Light GBM worked perfectlyReddit r/MachineLearningConsidering NeurIPS submission [D]Reddit r/MachineLearningAutomate Your Handyman Pricing: The True Hourly Cost AI ForgetsDev.to AIScience Is Not a Reading ProblemMedium AIHow Antigravity AI Changed My React Workflow (In Ways I Didn’t Expect)Medium AIToken Usage Is the New RAM UsageDev.to AIStop Writing Rules for AI AgentsDev.to AIUsing AI as your therapist?Medium AIDigital Marketing Trends and the Role of AI in Modern Business StrategiesMedium AIThe AI Pen: Collaborating With Artificial Intelligence Without Losing Your Unique VoiceMedium AI
AI NEWS HUBbyEIGENVECTOREigenvector

Gemma 4 vs Qwen 3.5 Benchmark Comparison

Reddit r/LocalLLaMAby /u/Fuzzy_Philosophy_606 https://www.reddit.com/user/Fuzzy_Philosophy_606April 3, 20261 min read0 views
Source Quiz

I took the official benchmarks for Qwen 3.5 and Gemma 4 and compiled them into a neck-and-neck comparison here. The Benchmark Table Benchmark Qwen 2B Gemma E2B Qwen 4B Gemma E4B Qwen 27B Gemma 31B Qwen 35B (MoE) Gemma 26B (MoE) MMLU-Pro 66.5% 60.0% 79.1% 69.4% 86.1% 85.2% 85.3% 82.6% GPQA Diamond 51.6% 43.4% 76.2% 58.6% 85.5% 84.3% 84.2% 82.3% LiveCodeBench v6 69.4% 44.0% 55.8% 52.0% 80.7% 80.0% 74.6% 77.1% Codeforces ELO N/A 633 24.1 940 1899 2150 2028 1718 TAU2-Bench 48.8% 24.5% 79.9% 42.2% 79.0% 76.9% 81.2% 68.2% MMMLU (Multilingual) 63.1% 60.0% 76.1% 69.4% 85.9% 85.2% 85.2% 86.3% HLE-n (No tools) N/A N/A N/A N/A 24.3% 19.5% 22.4% 8.7% HLE-t (With tools) N/A N/A N/A N/A 48.5% 26.5% 47.4% 17.2% AIME 2026 N/A N/A N/A 42.5% N/A 89.2% N/A 88.3% MMMU Pro (Vision) N/A N/A N/A N/A 75.0% 76.9%

Fetching article from Reddit r/LocalLLaMA…

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Gemma 4 vs …modelbenchmarkhuggingfaceReddit r/Lo…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Building knowledge graph…

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!