[P] Gemma 4 running on NVIDIA B200 and AMD MI355X from the same inference stack, 15% throughput gain over vLLM on Blackwell

Reddit r/MachineLearningby /u/carolinedfrasca https://www.reddit.com/user/carolinedfrascaApril 2, 20261 min read1 views

Google DeepMind dropped Gemma 4 today: Gemma 4 31B: dense, 256K context, redesigned architecture targeting efficiency and long-context quality Gemma 4 26B A4B: MoE, 26B total / 4B active per forward pass, 256K context Both are natively multimodal (text, image, video, dynamic resolution). We got both running on MAX on launch day across NVIDIA B200 and AMD MI355X from the same stack. On B200 we're seeing 15% higher output throughput vs. vLLM (happy to share more on methodology if useful). Free playground if you want to test without spinning anything up: https://www.modular.com/#playground submitted by /u/carolinedfrasca [link] [comments]

Could not retrieve the full article text.

Read on Reddit r/MachineLearning →

Original source

Reddit r/MachineLearning

https://www.reddit.com/r/MachineLearning/comments/1saot07/p_gemma_4_running_on_nvidia_b200_and_amd_mi355x/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

launchmultimodal

Releases

Bureau Veritas Launches an Independent AI Assessment Offering for European Enterprises, Developed in Partnership with Amazon Web Services (AWS) - STT Info

Bureau Veritas Launches an Independent AI Assessment Offering for European Enterprises, Developed in Partnership with Amazon Web Services (AWS) STT Info

GNews AI EU

1m3 days ago

CountriesRecent

S.Korea–France expand cooperation in critical minerals, nuclear power, and AI; Joint offshore wind projects launched - The Korea Post

S.Korea–France expand cooperation in critical minerals, nuclear power, and AI; Joint offshore wind projects launched The Korea Post

GNews AI France

1m1 day ago

ReleasesRecent

HotJobs launched as Sri Lanka’s first AI-driven recruitment platform - Daily FT

HotJobs launched as Sri Lanka’s first AI-driven recruitment platform Daily FT

Google News - AI Sri Lanka

1m1 day ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 125 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

Models

Exclusive | Pentagon Used Anthropic’s Claude in Maduro Venezuela Raid - WSJ

Exclusive | Pentagon Used Anthropic’s Claude in Maduro Venezuela Raid WSJ

Google News - AI Venezuela

1mabout 2 months ago

ModelsLive

Qwen 3.5 397B vs Qwen 3.6-Plus

I see a lot of people worried about the possibility of QWEN 3.6 397b not being released. However, if I look at the small percentage of variation between 3.5 and 3.6 in many benchmarks, I think that simply quantizing 3.6 to "human" dimensions (Q2_K_XL is needed to run on an RTX 6000 96GB + 48GB) would reduce the entire advantage to a few point zeros. I'm curious to see how the smaller models will perform towards Gemma 4, where competition has started. submitted by /u/LegacyRemaster [link] [comments]

Reddit r/LocalLLaMA

1mabout 1 hour ago

Models

Getting Started with Llama 3.1 405B: Build Custom LLMs with Synthetic Data Generation and Distillation - Snowflake

Getting Started with Llama 3.1 405B: Build Custom LLMs with Synthetic Data Generation and Distillation Snowflake

GNews AI fine-tuning

1m5 months ago

Models

MemRL outperforms RAG on complex agent benchmarks without fine-tuning - VentureBeat

MemRL outperforms RAG on complex agent benchmarks without fine-tuning VentureBeat

GNews AI fine-tuning

1m2 months ago