v5.5.0
update release workflow
Release v5.5.0
New Model additions
Gemma4
Gemma 4 is a multimodal model with pretrained and instruction-tuned variants, available in 1B, 13B, and 27B parameters. The architecture is mostly the same as the previous Gemma versions. The key differences are a vision processor that can output images of fixed token budget and a spatial 2D RoPE to encode vision-specific information across height and width axis.
You can find all the original Gemma 4 checkpoints under the Gemma 4 release.
The key difference from previous Gemma releases is the new design to process images of different sizes using a fixed-budget number of tokens. Unlike many models that squash every image into a fixed square (like 224×224), Gemma 4 keeps the image's natural aspect ratio while making it the right size. There a a couple constraints to follow:
-
The total number of pixels must fit within a patch budget
-
Both height and width must be divisible by 48 (= patch size 16 × pooling kernel 3)
Important
Gemma 4 does not apply the standard ImageNet mean/std normalization that many other vision models use. The model's own patch embedding layer handles the final scaling internally (shifting values to the [-1, 1] range).
The number of "soft tokens" (aka vision tokens) an image processor can produce is configurable. The supported options are outlined below and the default is 280 soft tokens per image.
Soft Tokens Patches (before pooling) Approx. Image Area
70 630 ~161K pixels
140 1,260 ~323K pixels
280 2,520 ~645K pixels
560 5,040 ~1.3M pixels
1,120 10,080 ~2.6M pixels
To encode positional information for each patch in the image, Gemma 4 uses a learned 2D position embedding table. The position table stores up to 10,240 positions per axis, which allows the model to handle very large images. Each position is a learned vector of the same dimensions as the patch embedding. The 2D RoPE which Gemma 4 uses independently rotate half the attention head dimensions for the x-axis and the other half for the y-axis. This allows the model to understand spatial relationships like "above," "below," "left of," and "right of."
NomicBERT
NomicBERT is a BERT-inspired encoder model that applies Rotary Position Embeddings (RoPE) to create reproducible long context text embeddings. It is the first fully reproducible, open-source text embedding model with 8192 context length that outperforms both OpenAI Ada-002 and OpenAI text-embedding-3-small on short-context MTEB and long context LoCo benchmarks. The model generates dense vector embeddings for various tasks including search, clustering, and classification using specific instruction prefixes.
Links: Documentation | Paper
- Internalise the NomicBERT model (#43067) by @ed22699 in #43067
MusicFlamingo
Music Flamingo is a fully open large audio–language model designed for robust understanding and reasoning over music. It builds upon the Audio Flamingo 3 architecture by including Rotary Time Embeddings (RoTE), which injects temporal position information to enable the model to handle audio sequences up to 20 minutes. The model features a unified audio encoder across speech, sound, and music with special sound boundary tokens for improved audio sequence modeling.
Links: Documentation | Paper
- Add Music Flamingo (#43538) by @lashahub in #43538
Breaking changes
Mamba and hybrid model caches are now first-class native citizens in the library, so users working with Mamba-based or hybrid (Mamba + attention) models should update their code to use the new native cache classes instead of any previous workarounds.
- 🚨 [Cache] Native mamba & hybrid cache (#44950) by @Cyrilvallez
Remote code execution support has been removed from the native LightGlue integration, so users who were loading LightGlue with trust_remote_code=True must remove that argument and use the model directly through the standard native API.
- 🚨 [LightGlue] Remove remote code execution (#45122) by @vasqu
Vision
Several vision-related bugs were fixed in this release, including correcting the Gemma vision mask to support video inputs, resolving a dependency issue that incorrectly required torchvision for PIL-based image processors, and patching bugs in the Janus image generation model and image loading. Local code resolution for tokenizers and image processors was also corrected.
-
Generalize gemma vision mask to videos (#45185) by @zucchini-nlp in [#45185]
-
Fix explicit local code resolution for tokenizers and image processors (#45169) by @hmellor in [#45169]
-
fix bug for janus model image generation (#45044) by @kaixuanliu in [#45044]
-
[Bugfix] Remove incorrect torchvision requirement from PIL backend image processors (#45045) by @Lidang-Jiang in [#45045]
-
Avoid Image.open failure (#44645) by @sywangyi in [#44645]
Cache
Improved the performance of repository checks (check-repo) by introducing file-level and AST-level disk caching, achieving up to a 27x speedup (from ~46s to ~1.6s with a warm cache), and fixed the mlinter cache location in .gitignore.
-
refactoring: speedup static checks with disk cache (#44992) by @tarekziade in [#44992]
-
refactor: added cache in check_repo (#45012) by @tarekziade in [#45012]
-
chore: Fix mlinter cache location (#45052) by @tarekziade in [#45052]
Bugfixes and improvements
-
Fix resized LM head weights being overwritten by post_init (#45079) by @javierdejesusda in [#45079]
-
[Qwen3.5 MoE] Add _tp_plan to ForConditionalGeneration (#45124) by @danielquintas8 in [#45124]
-
fix(models): Fix dtype mismatch in SwitchTransformers and TimmWrapperModel (#45074) by @harshaljanjani in [#45074]
-
[misc] fix qwen35 tests: correct the text model type and skip reverse_mapping (#45173) by @JJJYmmm in [#45173]
-
🔒 Pin GitHub Actions to commit SHAs (#45180) by @paulinebm in [#45180]
-
Use doc-builder runnable example for GLM-ASR (#44277) by @tarekziade in [#44277]
-
CI] Small T5 expectations updated (#45138) by @Abdennacer-Badaoui in [#45138]
-
fix: correct type annotations across config classes for @strict validation (#45007) by @Krishnachaitanyakc in [#45007]
-
Fix T5Attention shape mismatch under Tensor Parallelism (#45109) by @aws-zhanxun in [#45109]
-
[refactor] Serving into proper modules (#44796) by @SunMarc in [#44796]
-
Re-add regex substitutions to the response parsing spec (#45166) by @Rocketknight1 in [#45166]
-
Fix incorrect TrainingArguments example in training.md (#45150) by @maanas1234 in [#45150]
-
Add parse_response to Processor, make it a bit more official (#45143) by @Rocketknight1 in [#45143]
-
DeepGEMM (#44832) by @IlyasMoutawwakil in [#44832]
-
fix: prefer registered config over remote code in AutoConfig.from_pretrained (#45094) by @HanFa in [#45094]
-
[serving] Fix continuous batching JSON response serialization (#45057) by @NathanHB in [#45057]
-
Fix stupid test fetcher (#45140) by @ydshieh in [#45140]
-
[CB] Add warmup feature (#45112) by @remi-or in [#45112]
-
feature: added import complexity checker (#45013) by @tarekziade in [#45013]
-
Fix tests for janus model (#44739) by @kaixuanliu in [#44739]
-
CB improvements for serving (#45063) by @SunMarc in [#45063]
-
[docs] continuous batching (#44896) by @stevhliu in [#44896]
-
Fix few issues in Qwen_3_Omni_Moe (#44848) by @Sai-Suraj-27 in [#44848]
-
Fix TypeError in rope validation when ignore_keys is a list (#45069) by @Fr0do in [#45069]
-
Remove unused TensorFlow env var (#45065) by @Sai-Suraj-27 in [#45065]
-
fix: add identity reverse_op to dequantize ops for save_pretrained (#44983) by @Hyungkeun-Park-Nota in [#44983]
-
Fix when RoPE params are in kwargs (#45049) by @zucchini-nlp in [#45049]
-
chore: update update_metdata.yml (#45054) by @hf-security-analysis[bot] in [#45054]
-
[FA] Fix BC support for a few versions + add deprecation cycle (#45061) by @vasqu in [#45061]
-
fix(testing): Fix Parakeet, Evolla, Pi0, and Phi-3 test failures on main CI (#45004) by @harshaljanjani in [#45004]
-
Allow advanced users to override model_type in AutoConfig.from_pretrained (#45058) by @hmellor in [#45058]
-
Fix failing SmolLM3IntegrationTest (#45048) by @Sai-Suraj-27 in [#45048]
-
chore: remove old extras (#45024) by @tarekziade in [#45024]
-
Embedding VLMs don't need a head (#45000) by @zucchini-nlp in [#45000]
-
Fix GraniteConfig type hints to accept int for multiplier fields (#45019) by @javierdejesusda in [#45019]
-
fix: preserve rotary_pct across save/load cycle in GPTNeoX configs (#44985) by @Krishnachaitanyakc in [#44985]_
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @ed22699
Internalise the NomicBERT model (#43067)
- @tarekziade
Use doc-builder runnable example for GLM-ASR (#44277) refactoring: speedup static checks with disk cache (#44992) feature: added import complexity checker (#45013) refactor: added cache in check_repo (#45012) chore: remove old extras (#45024) chore: Fix mlinter cache location (#45052) refactor: speed up docstring checker (#45009)
- @Krishnachaitanyakc
fix: correct type annotations across config classes for @strict validation (#45007) fix: preserve rotary_pct across save/load cycle in GPTNeoX configs (#44985)
- @lashahub
Add Music Flamingo (#43538)
- @Lidang-Jiang
[Bugfix] Remove incorrect torchvision requirement from PIL backend image processors (#45045)
HF Transformers Releases
https://github.com/huggingface/transformers/releases/tag/v5.5.0Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
releaseupdate
Are there Multiple Moral Endpoints?
This is a different approach to explaining an old idea . What is the deep future going to look like? Will we be proud of it, or ashamed of the choices that led to it? Lots of focus on the future is on the near future. How will ongoing wars go? What will the next AI model's capabilities be? Will this business succeed or fail? Let's zoom out and focus just on advanced artificial intelligence; my guess is that we'll have a "transition period" with many different relevant actors with different philosophies and moral considerations (we're in it) and then end up in an "equilibrium period" with much more homogeneity of philosophy and moral standpoints. [1] The transition period is hard to predict and involves many contingent facts, and might involve lots of dramatic turnovers; the equilibrium per

Windows 11 will be force-updated to version 25H2 using machine learning
According to Microsoft's latest timeline, Windows 11 25H2 will soon be rolled out to all devices running the Home and Pro editions of Windows 11 24H2. The latter is set to reach the end of official support on October 13, 2026, and Redmond is clearly aiming to move as many... Read Entire Article

Meta Pauses Work With Mercor After LiteLLM-Linked Data Breach
A poisoned LiteLLM update hit Mercor, and Meta pulled the brake. The breach is now a warning flare for AI vendors built on open-source plumbing. The post Meta Pauses Work With Mercor After LiteLLM-Linked Data Breach appeared first on TechRepublic .
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Releases

Windows 11 will be force-updated to version 25H2 using machine learning
According to Microsoft's latest timeline, Windows 11 25H2 will soon be rolled out to all devices running the Home and Pro editions of Windows 11 24H2. The latter is set to reach the end of official support on October 13, 2026, and Redmond is clearly aiming to move as many... Read Entire Article

OpenAI Pushes for Policies to Offset AI’s Impact | Bloomberg Tech 4/6/2026
Bloomberg’s Caroline Hyde and Ed Ludlow speak with OpenAI’s Chief Global Affairs Officer Chris Lehane about the company’s new policy proposals to help manage the rapid changes brought about by artificial intelligence. Plus, Oracle names a new CFO to help the company navigate massive data center development plans and a cash crunch. And, NASA's Artemis II is set to pass at the closest distance that humans have come to the lunar surface in 50 years. (Source: Bloomberg)

Meta Pauses Work With Mercor After LiteLLM-Linked Data Breach
A poisoned LiteLLM update hit Mercor, and Meta pulled the brake. The breach is now a warning flare for AI vendors built on open-source plumbing. The post Meta Pauses Work With Mercor After LiteLLM-Linked Data Breach appeared first on TechRepublic .

Instruction Following and Hallucination Ratings for Gemma 4 - Any metrics available?
I am trying to find hallucination evaluations of Gemma 4? it is not yet available in https://github.com/vectara/hallucination-leaderboard . Anyone have any information? Thanks. submitted by /u/appakaradi [link] [comments]

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!