Live

•Black Hat USADark Reading •Black Hat AsiaAI Business •Artificial intelligence hype cycle risks collapse with market implications - MSNGoogle News: AI •Meta paused its work with AI training startup Mercor after a data breachBusiness Insider •[R], 31 MILLIONS High frequency data, Light GBM worked perfectlyReddit r/MachineLearning •Considering NeurIPS submission [D]Reddit r/MachineLearning •Automate Your Handyman Pricing: The True Hourly Cost AI ForgetsDev.to AI •Science Is Not a Reading ProblemMedium AI •How Antigravity AI Changed My React Workflow (In Ways I Didn’t Expect)Medium AI •Token Usage Is the New RAM UsageDev.to AI •Stop Writing Rules for AI AgentsDev.to AI •Using AI as your therapist?Medium AI •Digital Marketing Trends and the Role of AI in Modern Business StrategiesMedium AI •The AI Pen: Collaborating With Artificial Intelligence Without Losing Your Unique VoiceMedium AI •Black Hat USADark Reading •Black Hat AsiaAI Business •Artificial intelligence hype cycle risks collapse with market implications - MSNGoogle News: AI •Meta paused its work with AI training startup Mercor after a data breachBusiness Insider •[R], 31 MILLIONS High frequency data, Light GBM worked perfectlyReddit r/MachineLearning •Considering NeurIPS submission [D]Reddit r/MachineLearning •Automate Your Handyman Pricing: The True Hourly Cost AI ForgetsDev.to AI •Science Is Not a Reading ProblemMedium AI •How Antigravity AI Changed My React Workflow (In Ways I Didn’t Expect)Medium AI •Token Usage Is the New RAM UsageDev.to AI •Stop Writing Rules for AI AgentsDev.to AI •Using AI as your therapist?Medium AI •Digital Marketing Trends and the Role of AI in Modern Business StrategiesMedium AI •The AI Pen: Collaborating With Artificial Intelligence Without Losing Your Unique VoiceMedium AI

AI NEWS HUBbyEIGENVECTOR

Gemma 4 vs Qwen 3.5 Benchmark Comparison

Models model benchmark huggingface

Gemma 4 vs Qwen 3.5 Benchmark Comparison

Reddit r/LocalLLaMAby /u/Fuzzy_Philosophy_606 https://www.reddit.com/user/Fuzzy_Philosophy_606April 3, 20261 min read0 views

I took the official benchmarks for Qwen 3.5 and Gemma 4 and compiled them into a neck-and-neck comparison here. The Benchmark Table Benchmark Qwen 2B Gemma E2B Qwen 4B Gemma E4B Qwen 27B Gemma 31B Qwen 35B (MoE) Gemma 26B (MoE) MMLU-Pro 66.5% 60.0% 79.1% 69.4% 86.1% 85.2% 85.3% 82.6% GPQA Diamond 51.6% 43.4% 76.2% 58.6% 85.5% 84.3% 84.2% 82.3% LiveCodeBench v6 69.4% 44.0% 55.8% 52.0% 80.7% 80.0% 74.6% 77.1% Codeforces ELO N/A 633 24.1 940 1899 2150 2028 1718 TAU2-Bench 48.8% 24.5% 79.9% 42.2% 79.0% 76.9% 81.2% 68.2% MMMLU (Multilingual) 63.1% 60.0% 76.1% 69.4% 85.9% 85.2% 85.2% 86.3% HLE-n (No tools) N/A N/A N/A N/A 24.3% 19.5% 22.4% 8.7% HLE-t (With tools) N/A N/A N/A N/A 48.5% 26.5% 47.4% 17.2% AIME 2026 N/A N/A N/A 42.5% N/A 89.2% N/A 88.3% MMMU Pro (Vision) N/A N/A N/A N/A 75.0% 76.9%

Fetching article from Reddit r/LocalLLaMA…

Original source

Reddit r/LocalLLaMA

https://www.reddit.com/r/LocalLLaMA/comments/1sbp8ny/gemma_4_vs_qwen_35_benchmark_comparison/

Was this article helpful?

Sign in to highlight and annotate this article

Ask AI about this article

Powered by Eigenvector · full article context loaded

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Building knowledge graph…

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!