Open Source AI llama model transformer training release version

Google - Gemma 4 now in Unsloth!

Unsloth Releasesby unslothaiApril 2, 20264 min read1 views

Google releases Gemma 4 with four new models: E2B, E4B, 26B-A4B, 31B. You can now run and train the Gemma 4 models in Unsloth. Guide / Blog: https://unsloth.ai/docs/models/gemma-4 The multimodal reasoning models are licensed under Apache 2.0. Run E2B and E4B on 6GB RAM, and on phones. Run 26B-A4B and 31B on ~18GB. GGUFs: https://huggingface.co/collections/unsloth/gemma-4 Updates Tool calls for smaller models are now more stable and don't cut off anymore Context length is now properly applied. Tool calls for all models are now +30% to +80% more accurate. Web search now actually gets web content and not just summaries Number of tool calls allowed are increased to 25 from 10 Tool calls now terminate much better, so looping / repetitions will be reduced More tool call healing and de-duplicatio

Google releases Gemma 4 with four new models: E2B, E4B, 26B-A4B, 31B.

You can now run and train the Gemma 4 models in Unsloth. Guide / Blog: https://unsloth.ai/docs/models/gemma-4
The multimodal reasoning models are licensed under Apache 2.0.
Run E2B and E4B on 6GB RAM, and on phones. Run 26B-A4B and 31B on ~18GB.
GGUFs: https://huggingface.co/collections/unsloth/gemma-4

Updates

Tool calls for smaller models are now more stable and don't cut off anymore
Context length is now properly applied.
Tool calls for all models are now +30% to +80% more accurate.
Web search now actually gets web content and not just summaries
Number of tool calls allowed are increased to 25 from 10
Tool calls now terminate much better, so looping / repetitions will be reduced
More tool call healing and de-duplication logic to stop tool callings from leaking XML as well
Tested with unsloth/Qwen3.5-4B-GGUF (UD-Q4_K_XL), web search + code execution + thinking enabled.

Metric Before After

XML leaks in response 10/10 0/10

URL fetches used 0 4/10 runs

Runs with correct song names 0/10 2/10

Avg tool calls 5.5 3.8

Avg response time 12.3s 9.8s

Run Gemma 4 in Unsloth Studio:

What's Changed

studio: Polish Windows installer/setup logs by @Imagineer99 in #4736
feat: move folder management into model selector dropdown by @Shine1i in #4731
fix: clear tool status badge immediately after tool execution by @Shine1i in #4733
refactor flex attn to prefer flash if possible by @Datta0 in #4734
Fix Windows local GGUF model loading crash by @danielhanchen in #4730
Fix OOM model styling in Studio model selectors by @LeoBorcherding in #4738
feat(studio): strip org prefix in model search to surface unsloth variants by @rolandtannous in #4749
Fix forward compatibility with transformers 5.x by @danielhanchen in #4752
Architecture-aware KV cache VRAM estimation (5-path) by @danielhanchen in #4757
Fix save_pretrained_merged for full-finetuned models by @danielhanchen in #4755
Feat/prebuiltllamacpp by @mmathew23 in #4741
Add installer test coverage for prebuilt llama.cpp changes by @danielhanchen in #4756
fix: studio web search SSL failures and empty page content by @danielhanchen in #4754
fix: add tokenizers to no-torch deps and TORCH_CONSTRAINT for arm64 macOS py313+ by @danielhanchen in #4748
fix(studio): allow context length slider to reach model's native limit by @danielhanchen in #4746
Tests for architecture-aware KV cache estimation by @danielhanchen in #4760
Fix custom llama.cpp source builds and macos metal source builds by @mmathew23 in #4762
studio: align composer/code, unify fonts, and remove tool collapse jitter by @Imagineer99 in #4763
fix(chat): correct loading text for cached models during inference by @AdamPlatin123 in #4764
fix(security): shell injection in GGML export conversion by @mateeaaaaaaa in #4768
Add regression test for shell injection fix in GGML conversion by @danielhanchen in #4773
fix(studio): prevent small models from stalling on tool-calling tasks by @danielhanchen in #4769
Add regression tests for custom llama prebuilt installer by @danielhanchen in #4772
Feat/custom llama prebuilt by @mmathew23 in #4771
studio: fix chat font changes leaking outside chat page by @Imagineer99 in #4775
feat(studio): display images from Python tool execution in chat UI by @danielhanchen in #4778
ui improvement by @rolandtannous in #4781
UI Changes by @danielhanchen in #4782
fix(studio): improve tool-calling re-prompt for small models by @danielhanchen in #4783
Pin Gemma-4 transformers requirement to 5.5.0 stable by @danielhanchen in #4784
Switch llama.cpp default to mainline ggml-org by @danielhanchen in #4785
Use transformers v5.5-release branch, pin to 5.5.0 by @danielhanchen in #4786
Fix: pin transformers==4.57.6 in main Studio venv by @danielhanchen in #4788
fix(studio): build llama.cpp from master for Gemma 4 support by @danielhanchen in #4790
fix name fixed name by @rolandtannous in #4791
fix(studio): prioritize curated defaults in Recommended model list by @danielhanchen in #4792
fix windows llama.cpp compile from source issue by @mmathew23 in #4793
fix(studio): pin llama.cpp to b8637 (Gemma 4 support) by @danielhanchen in #4796
fix(studio): don't set trust_remote_code for Gemma 4 training by @danielhanchen in #4795
fix(studio): revert llama.cpp default tag to latest by @danielhanchen in #4797
fix(studio): suppress fatal error when ggml-org has no prebuilt manifest by @danielhanchen in #4799

New Contributors

@AdamPlatin123 made their first contribution in #4764
@mateeaaaaaaa made their first contribution in #4768

Full Changelog: v0.1.3-beta...v0.1.35-beta

Original source

Unsloth Releases

https://github.com/unslothai/unsloth/releases/tag/v0.1.35-beta

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamamodeltransformer

ReleasesFresh

Washington launches export initiative to ensure ‘future of AI is led by the United States’

The US government has unveiled an AI export initiative designed to cement American technological leadership while countering China’s growing influence in the sector. Washington is inviting US companies to form “preset” consortiums to offer full-stack artificial intelligence solutions around the world. Applications will be accepted until the end of June, the International Trade Administration (ITA), an agency under the US Department of Commerce, said in a news release issued on Wednesday. The...

SCMP Tech (Asia AI)

1mabout 3 hours ago

ReleasesLive

Tencent expands OpenClaw suite with enterprise tool amid China’s ‘lobster’ craze

Tencent Holdings has launched a new OpenClaw tool for enterprises that promises easy deployment of the artificial intelligence agent as part of the Chinese internet giant’s efforts to capitalise on the “lobster” frenzy in the country. ClawPro, launched in public beta by Tencent’s cloud unit on Thursday, works as an AI agent management platform for enterprises, allowing them to deploy OpenClaw templates, select models and agents, track token consumption and manage security settings. Tencent said...

SCMP Tech (Asia AI)

1mabout 2 hours ago

Models

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models WSJ

Google News: LLM

1m3 days ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 161 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Open Source AI

Open Source AILive

Netflix just dropped their first public model on Hugging Face: VOID: Video Object and Interaction Deletion

Hugging Face netflix/void-model: https://huggingface.co/netflix/void-model Project page - GitHub: https://github.com/Netflix/void-model Demo: https://huggingface.co/spaces/sam-motamed/VOID submitted by /u/Nunki08 [link] [comments]

Reddit r/LocalLLaMA

1mabout 1 hour ago

Open Source AIFresh

Posterior Optimization with Clipped Objective for Bridging Efficiency and Stability in Generative Policy Learning

arXiv:2604.01860v1 Announce Type: new Abstract: Expressive generative models have advanced robotic manipulation by capturing complex, multi-modal action distributions over temporally extended trajectories. However, fine-tuning these policies via RL remains challenging due to instability and sample inefficiency. We introduce Posterior Optimization with Clipped Objective (POCO), a principled RL framework that formulates policy improvement as a posterior inference problem tailored for temporal action chunks. Through an Expectation-Maximization procedure, POCO distills a reward-weighted implicit posterior into the policy without likelihood estimation. Furthermore, POCO adopts an offline-to-online paradigm that anchors online exploration to pre-trained priors, and its model-agnostic design scal

arXiv cs.RO

1mabout 10 hours ago

Open Source AI

Fine-tuning to deliver business AI value - Computer Weekly

Fine-tuning to deliver business AI value Computer Weekly

GNews AI fine-tuning

1m9 months ago

Open Source AIFresh

VRAM optimization for gemma 4

TLDR: add -np 1 to your llama.cpp launch command if you are the only user, cuts SWA cache VRAM by 3x instantly So I was messing around with Gemma 4 and noticed the dense model hogs a massive chunk of VRAM before you even start generating anything. If you are on 16GB you might be hitting OOM and wondering why. The culprit is the SWA (Sliding Window Attention) KV cache. It allocates in F16 and does not get quantized like the rest of the KV cache. A couple days ago ggerganov merged a PR that accidentally made this worse by keeping the SWA portion unquantized even when you have KV cache quantization enabled. It got reverted about 2 hours later here https://github.com/ggml-org/llama.cpp/pull/21332 so make sure you are on a recent build. A few things that actually help with VRAM: The SWA cache s

Reddit r/LocalLLaMA

2mabout 5 hours ago