Gemma 4 is great at real-time Japanese - English translation for games
When Gemma 3 27B QAT IT was released last year, it was SOTA for local real-time Japanese-English translation for visual novel for a while. So I want to see how Gemma 4 handle this use case. Model: Unsloth's gemma-4-26B-A4B-it-UD-Q5_K_M Context: 8192 Reasoning: OFF Softwares: Front end: Luna Translator Back end: LM Studio Workflow: Luna hooks the dialogue and speaker's name from the game. A Python script structures the hooked text (add name, gender). Luna sends the structured text and a system prompt to LM Studio Luna shows the translation. What Gemma 4 does great: Even with reasoning disabled, Gemma 4 follows instructions in system prompt very well. With structured text, gemma 4 deals with pronouns well. This is one of the biggest challenges because Japanese spoken dialogue often omit subj
Could not retrieve the full article text.
Read on Reddit r/LocalLLaMA →Reddit r/LocalLLaMA
https://www.reddit.com/r/LocalLLaMA/comments/1sbiqx3/gemma_4_is_great_at_realtime_japanese_english/Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelreleasejapan
Understanding digital health technology implementation in rehabilitation and development of the Rehabilitation Technologies Implementation model
npj Digital Medicine, Published online: 04 April 2026; doi:10.1038/s41746-026-02599-1 Understanding digital health technology implementation in rehabilitation and development of the Rehabilitation Technologies Implementation model

Speculative decoding works great for Gemma 4 31B in llama.cpp
I get a ~11% speed up with Gemma 3 270B as the draft model. Try it by adding: --no-mmproj -hfd unsloth/gemma-3-270m-it-GGUF:Q8_0 Testing with (on a 3090): ./build/bin/llama-cli -hf unsloth/gemma-4-31B-it-GGUF:Q4_1 --jinja --temp 1.0 --top-p 0.95 --top-k 64 -ngl 1000 -st -f prompt.txt --no-mmproj -hfd unsloth/gemma-3-270m-it-GGUF:Q8_0 Gave me: [ Prompt: 607.3 t/s | Generation: 36.6 t/s ] draft acceptance rate = 0.44015 ( 820 accepted / 1863 generated) vs. [ Prompt: 613.8 t/s | Generation: 32.9 t/s ] submitted by /u/Leopold_Boom [link] [comments]

Gemma 4 - 4B vs Qwen 3.5 - 9B ?
Hello! anyone tried the 4B Gemma 4 model and the Qwen 3.5 9B model and can tell us their feedback? On the benchmark Qwen seems to be doing better, but I would appreciate any personal experience on the matter Thanks! submitted by /u/No-Mud-1902 [link] [comments]
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Open Source AI

Speculative decoding works great for Gemma 4 31B in llama.cpp
I get a ~11% speed up with Gemma 3 270B as the draft model. Try it by adding: --no-mmproj -hfd unsloth/gemma-3-270m-it-GGUF:Q8_0 Testing with (on a 3090): ./build/bin/llama-cli -hf unsloth/gemma-4-31B-it-GGUF:Q4_1 --jinja --temp 1.0 --top-p 0.95 --top-k 64 -ngl 1000 -st -f prompt.txt --no-mmproj -hfd unsloth/gemma-3-270m-it-GGUF:Q8_0 Gave me: [ Prompt: 607.3 t/s | Generation: 36.6 t/s ] draft acceptance rate = 0.44015 ( 820 accepted / 1863 generated) vs. [ Prompt: 613.8 t/s | Generation: 32.9 t/s ] submitted by /u/Leopold_Boom [link] [comments]

Gemma 4 fixes in llama.cpp
There have already been opinions that Gemma is bad because it doesn’t work well, but you probably aren’t using the transformers implementation, you’re using llama.cpp. After a model is released, you have to wait at least a few days for all the fixes in llama.cpp, for example: https://github.com/ggml-org/llama.cpp/pull/21418 https://github.com/ggml-org/llama.cpp/pull/21390 https://github.com/ggml-org/llama.cpp/pull/21406 https://github.com/ggml-org/llama.cpp/pull/21327 https://github.com/ggml-org/llama.cpp/pull/21343 ...and maybe there will be more? I had a looping problem in chat, but I also tried doing some stuff in OpenCode (it wasn’t even coding), and there were zero problems. So, probably just like with GLM Flash, a better prompt somehow fixes the overthinking/looping. submitted by /u/

I Tested Every Gemma 4 Model Locally on My MacBook - What Actually Works
Audio ASR in 3 languages, image understanding, full-stack app generation, coding, and agentic behavior -- all running on a MacBook M4 Pro with 24GB RAM. Interactive version with playable audio, live charts, and the working React app: gemma4-benchmark.pages.dev Google just released Gemma 4 -- their new family of open-source multimodal models. Four sizes, Apache-2.0 licensed, supports text + image + audio. I spent a day testing every variant. Real audio files. Real images. Code that has to compile and run. Here is my honest report. The Gemma 4 Family E2B -- Dense 2.3B, Text/Image/Audio, 4 GB at 4-bit. Phones and edge. E4B -- Dense 4.5B, Text/Image/Audio, 5.5 GB at 4-bit. Laptops. 26B-A4B -- MoE 4B active/26B total, Text/Image, 16-18 GB at 4-bit. 31B -- Dense 31B, Text/Image, 17-20 GB at 4-bi


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!