Gemma 4 31B beats several frontier models on the FoodTruck Bench

More about

claudemodel

Bench 2xMI50 Qwen3.5 27b vs Gemma4 31B (vllm-gfx906-mobydick)

Inference engine used (vllm fork) : https://github.com/ai-infos/vllm-gfx906-mobydick/tree/main Huggingface Quants used: QuantTrio/Qwen3.5-27B-AWQ vs cyankiwi/gemma-4-31B-it-AWQ-4bit Relevant commands to run : docker run -it --name vllm-gfx906-mobydick -v ~/llm/models:/models --network host --device=/dev/kfd --device=/dev/dri --group-add video --group-add $(getent group render | cut -d: -f3) --ipc=host aiinfos/vllm-gfx906-mobydick:latest FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE" OMP_NUM_THREADS=4 VLLM_LOGGING_LEVEL=DEBUG vllm serve \ /models/gemma-4-31B-it-AWQ-4bit \ --served-model-name gemma-4-31B-it-AWQ-4bit \ --dtype float16 \ --max-model-len auto \ --gpu-memory-utilization 0.95 \ --enable-auto-tool-choice \ --tool-call-parser gemma4 \ --reasoning-parser gemma4 \ --mm-processor-cache-gb 1

Reddit r/LocalLLaMA

3mabout 4 hours ago

Open Source AILive

Vllm gemma4 26b a4b it-nvfp4 run success

#!/usr/bin/env bash set -euo pipefail BASE_DIR="/mnt/d/AI/docker-gemma4" PATCH_DIR="$BASE_DIR/nvfp4_patch" BUILD_DIR="$BASE_DIR/build" HF_CACHE_DIR="$BASE_DIR/hf-cache" LOG_DIR="$BASE_DIR/logs" PATCH_FILE="$PATCH_DIR/gemma4_patched.py" DOCKERFILE_PATH="$BUILD_DIR/Dockerfile" BASE_IMAGE="vllm/vllm-openai:gemma4" PATCHED_IMAGE="vllm-gemma4-nvfp4-patched" CONTAINER_NAME="vllm-gemma4-nvfp4" MODEL_ID="bg-digitalservices/Gemma-4-26B-A4B-it-NVFP4" SERVED_MODEL_NAME="gemma-4-26b-a4b-it-nvfp4" GPU_MEMORY_UTILIZATION="0.88" MAX_MODEL_LEN="512" MAX_NUM_SEQS="1" PORT=" " PATCH_URL=" https://huggingface.co/bg-digitalservices/Gemma-4-26B-A4B-it-NVFP4/resolve/main/gemma4_patched.py?download=true " if [[ -z "${HF_TOKEN:-}" ]]; then echo "[ERROR] HF_TOKEN environment variable is empty." echo "Please run th

Reddit r/LocalLLaMA

2mabout 1 hour ago

ModelsLive

Prompts you use to test/trip up your LLMs

I'm obsessed with finding prompts to test the quality of different local models. I've pretty much landed on several that I use across the board. Tell me about the Apple A6 (a pass is if it mentions Apple made their own microarchitecture called swift for the CPU cores, the main thing that the A6 is historically known for as the first Apple SOC to do it. This tests if it is smart enough to mention historically relevant information first) Tell me about the history of Phoenix's freeway network (A pass is if it gives a historical narration instead of just listing freeways. We asked for history, after all. Again, testing for its understanding of putting relevant information first.) Tell me about the Pentium D. Why was it a bad processor ( A pass is it it mentions that it glued two separate penti

Reddit r/LocalLLaMA

4mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 167 connections

Scroll to zoom · drag to pan · click to open

More in Models

Models

Vietnam IP law change fuels debate, uncertainty over AI training data - MLex

Vietnam IP law change fuels debate, uncertainty over AI training data MLex

Google News - AI Vietnam

1m4 months ago

ModelsFresh

Bench 2xMI50 Qwen3.5 27b vs Gemma4 31B (vllm-gfx906-mobydick)

Inference engine used (vllm fork) : https://github.com/ai-infos/vllm-gfx906-mobydick/tree/main Huggingface Quants used: QuantTrio/Qwen3.5-27B-AWQ vs cyankiwi/gemma-4-31B-it-AWQ-4bit Relevant commands to run : docker run -it --name vllm-gfx906-mobydick -v ~/llm/models:/models --network host --device=/dev/kfd --device=/dev/dri --group-add video --group-add $(getent group render | cut -d: -f3) --ipc=host aiinfos/vllm-gfx906-mobydick:latest FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE" OMP_NUM_THREADS=4 VLLM_LOGGING_LEVEL=DEBUG vllm serve \ /models/gemma-4-31B-it-AWQ-4bit \ --served-model-name gemma-4-31B-it-AWQ-4bit \ --dtype float16 \ --max-model-len auto \ --gpu-memory-utilization 0.95 \ --enable-auto-tool-choice \ --tool-call-parser gemma4 \ --reasoning-parser gemma4 \ --mm-processor-cache-gb 1

Reddit r/LocalLLaMA

3mabout 4 hours ago

ModelsFresh

RTX 5090 gemma4-26b TG performance report

Nothing exhaustive... but I thought I'd report what I've seen from early testing. I'm running a modified version of vLLM that has NVFP4 support for gemma4-26b. Weights come in around 15.76 GiB and the remainder is KV cache. I'm running full context as well. For a "story telling" prompt and raw output with no thinking, I'm seeing about 150 t/s on TG. TTFT in streaming mode is about 80ms. Quality is good! submitted by /u/Nice_Cellist_7595 [link] [comments]

Reddit r/LocalLLaMA

1mabout 2 hours ago

ModelsLive

Prompts you use to test/trip up your LLMs

I'm obsessed with finding prompts to test the quality of different local models. I've pretty much landed on several that I use across the board. Tell me about the Apple A6 (a pass is if it mentions Apple made their own microarchitecture called swift for the CPU cores, the main thing that the A6 is historically known for as the first Apple SOC to do it. This tests if it is smart enough to mention historically relevant information first) Tell me about the history of Phoenix's freeway network (A pass is if it gives a historical narration instead of just listing freeways. We asked for history, after all. Again, testing for its understanding of putting relevant information first.) Tell me about the Pentium D. Why was it a bad processor ( A pass is it it mentions that it glued two separate penti

Reddit r/LocalLLaMA

4mabout 1 hour ago

Gemma 4 31B beats several frontier models on the FoodTruck Bench

Daily AI Digest

More about

Bench 2xMI50 Qwen3.5 27b vs Gemma4 31B (vllm-gfx906-mobydick)

Vllm gemma4 26b a4b it-nvfp4 run success

Prompts you use to test/trip up your LLMs

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Models

Vietnam IP law change fuels debate, uncertainty over AI training data - MLex

Bench 2xMI50 Qwen3.5 27b vs Gemma4 31B (vllm-gfx906-mobydick)

RTX 5090 gemma4-26b TG performance report

Prompts you use to test/trip up your LLMs