Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessThe danger in the Global South’s pursuit of AI as a magical cureSCMP Tech (Asia AI)Gas prices aren t the only factor fueling used EV salesTechCrunchAI Products Have Terrible UX: Here's WhyHackernoon AItrunk/554274c72bf951d04a0b2ec343b9b7e9625d26c8: [dynamo] Reduce special casing for namedtuple objects (#179381)PyTorch ReleasesChina Is Willing to Coordinate on AI Governanceintelligence.orgAre there Multiple Moral Endpoints?lesswrong.com[D] From the Web to World Models: The New Layer of PowerReddit r/MachineLearningAI pragmatists: How language teachers are navigating AI with nuanceThe Conversation AIJoint Task Force Commits Over $600 Million to Procure New Counter-UAS Capabilitydefense.govOpenAI Pushes for Policies to Offset AI’s Impact | Bloomberg Tech 4/6/2026Bloomberg TechnologyOpenAI asks California, Delaware to investigate Musk's 'anti-competitive behavior' ahead of April trial - cnbc.comGNews AI AGINvidia-Backed Data Center Builder Firmus Raises $505 MillionBloomberg TechnologyBlack Hat USADark ReadingBlack Hat AsiaAI BusinessThe danger in the Global South’s pursuit of AI as a magical cureSCMP Tech (Asia AI)Gas prices aren t the only factor fueling used EV salesTechCrunchAI Products Have Terrible UX: Here's WhyHackernoon AItrunk/554274c72bf951d04a0b2ec343b9b7e9625d26c8: [dynamo] Reduce special casing for namedtuple objects (#179381)PyTorch ReleasesChina Is Willing to Coordinate on AI Governanceintelligence.orgAre there Multiple Moral Endpoints?lesswrong.com[D] From the Web to World Models: The New Layer of PowerReddit r/MachineLearningAI pragmatists: How language teachers are navigating AI with nuanceThe Conversation AIJoint Task Force Commits Over $600 Million to Procure New Counter-UAS Capabilitydefense.govOpenAI Pushes for Policies to Offset AI’s Impact | Bloomberg Tech 4/6/2026Bloomberg TechnologyOpenAI asks California, Delaware to investigate Musk's 'anti-competitive behavior' ahead of April trial - cnbc.comGNews AI AGINvidia-Backed Data Center Builder Firmus Raises $505 MillionBloomberg Technology
AI NEWS HUBbyEIGENVECTOREigenvector

v5.5.0

HF Transformers Releasesby ArthurZuckerApril 2, 20261 min read0 views
Source Quiz

update release workflow

Release v5.5.0

New Model additions

Gemma4

Gemma 4 is a multimodal model with pretrained and instruction-tuned variants, available in 1B, 13B, and 27B parameters. The architecture is mostly the same as the previous Gemma versions. The key differences are a vision processor that can output images of fixed token budget and a spatial 2D RoPE to encode vision-specific information across height and width axis.

You can find all the original Gemma 4 checkpoints under the Gemma 4 release.

The key difference from previous Gemma releases is the new design to process images of different sizes using a fixed-budget number of tokens. Unlike many models that squash every image into a fixed square (like 224×224), Gemma 4 keeps the image's natural aspect ratio while making it the right size. There a a couple constraints to follow:

  • The total number of pixels must fit within a patch budget

  • Both height and width must be divisible by 48 (= patch size 16 × pooling kernel 3)

Important

Gemma 4 does not apply the standard ImageNet mean/std normalization that many other vision models use. The model's own patch embedding layer handles the final scaling internally (shifting values to the [-1, 1] range).

The number of "soft tokens" (aka vision tokens) an image processor can produce is configurable. The supported options are outlined below and the default is 280 soft tokens per image.

Soft Tokens Patches (before pooling) Approx. Image Area

70 630 ~161K pixels

140 1,260 ~323K pixels

280 2,520 ~645K pixels

560 5,040 ~1.3M pixels

1,120 10,080 ~2.6M pixels

To encode positional information for each patch in the image, Gemma 4 uses a learned 2D position embedding table. The position table stores up to 10,240 positions per axis, which allows the model to handle very large images. Each position is a learned vector of the same dimensions as the patch embedding. The 2D RoPE which Gemma 4 uses independently rotate half the attention head dimensions for the x-axis and the other half for the y-axis. This allows the model to understand spatial relationships like "above," "below," "left of," and "right of."

NomicBERT

NomicBERT is a BERT-inspired encoder model that applies Rotary Position Embeddings (RoPE) to create reproducible long context text embeddings. It is the first fully reproducible, open-source text embedding model with 8192 context length that outperforms both OpenAI Ada-002 and OpenAI text-embedding-3-small on short-context MTEB and long context LoCo benchmarks. The model generates dense vector embeddings for various tasks including search, clustering, and classification using specific instruction prefixes.

Links: Documentation | Paper

  • Internalise the NomicBERT model (#43067) by @ed22699 in #43067

MusicFlamingo

Music Flamingo is a fully open large audio–language model designed for robust understanding and reasoning over music. It builds upon the Audio Flamingo 3 architecture by including Rotary Time Embeddings (RoTE), which injects temporal position information to enable the model to handle audio sequences up to 20 minutes. The model features a unified audio encoder across speech, sound, and music with special sound boundary tokens for improved audio sequence modeling.

Links: Documentation | Paper

  • Add Music Flamingo (#43538) by @lashahub in #43538

Breaking changes

Mamba and hybrid model caches are now first-class native citizens in the library, so users working with Mamba-based or hybrid (Mamba + attention) models should update their code to use the new native cache classes instead of any previous workarounds.

  • 🚨 [Cache] Native mamba & hybrid cache (#44950) by @Cyrilvallez

Remote code execution support has been removed from the native LightGlue integration, so users who were loading LightGlue with trust_remote_code=True must remove that argument and use the model directly through the standard native API.

  • 🚨 [LightGlue] Remove remote code execution (#45122) by @vasqu

Vision

Several vision-related bugs were fixed in this release, including correcting the Gemma vision mask to support video inputs, resolving a dependency issue that incorrectly required torchvision for PIL-based image processors, and patching bugs in the Janus image generation model and image loading. Local code resolution for tokenizers and image processors was also corrected.

  • Generalize gemma vision mask to videos (#45185) by @zucchini-nlp in [#45185]

  • Fix explicit local code resolution for tokenizers and image processors (#45169) by @hmellor in [#45169]

  • fix bug for janus model image generation (#45044) by @kaixuanliu in [#45044]

  • [Bugfix] Remove incorrect torchvision requirement from PIL backend image processors (#45045) by @Lidang-Jiang in [#45045]

  • Avoid Image.open failure (#44645) by @sywangyi in [#44645]

Cache

Improved the performance of repository checks (check-repo) by introducing file-level and AST-level disk caching, achieving up to a 27x speedup (from ~46s to ~1.6s with a warm cache), and fixed the mlinter cache location in .gitignore.

  • refactoring: speedup static checks with disk cache (#44992) by @tarekziade in [#44992]

  • refactor: added cache in check_repo (#45012) by @tarekziade in [#45012]

  • chore: Fix mlinter cache location (#45052) by @tarekziade in [#45052]

Bugfixes and improvements

  • Fix resized LM head weights being overwritten by post_init (#45079) by @javierdejesusda in [#45079]

  • [Qwen3.5 MoE] Add _tp_plan to ForConditionalGeneration (#45124) by @danielquintas8 in [#45124]

  • fix(models): Fix dtype mismatch in SwitchTransformers and TimmWrapperModel (#45074) by @harshaljanjani in [#45074]

  • [misc] fix qwen35 tests: correct the text model type and skip reverse_mapping (#45173) by @JJJYmmm in [#45173]

  • 🔒 Pin GitHub Actions to commit SHAs (#45180) by @paulinebm in [#45180]

  • Use doc-builder runnable example for GLM-ASR (#44277) by @tarekziade in [#44277]

  • CI] Small T5 expectations updated (#45138) by @Abdennacer-Badaoui in [#45138]

  • fix: correct type annotations across config classes for @strict validation (#45007) by @Krishnachaitanyakc in [#45007]

  • Fix T5Attention shape mismatch under Tensor Parallelism (#45109) by @aws-zhanxun in [#45109]

  • [refactor] Serving into proper modules (#44796) by @SunMarc in [#44796]

  • Re-add regex substitutions to the response parsing spec (#45166) by @Rocketknight1 in [#45166]

  • Fix incorrect TrainingArguments example in training.md (#45150) by @maanas1234 in [#45150]

  • Add parse_response to Processor, make it a bit more official (#45143) by @Rocketknight1 in [#45143]

  • DeepGEMM (#44832) by @IlyasMoutawwakil in [#44832]

  • fix: prefer registered config over remote code in AutoConfig.from_pretrained (#45094) by @HanFa in [#45094]

  • [serving] Fix continuous batching JSON response serialization (#45057) by @NathanHB in [#45057]

  • Fix stupid test fetcher (#45140) by @ydshieh in [#45140]

  • [CB] Add warmup feature (#45112) by @remi-or in [#45112]

  • feature: added import complexity checker (#45013) by @tarekziade in [#45013]

  • Fix tests for janus model (#44739) by @kaixuanliu in [#44739]

  • CB improvements for serving (#45063) by @SunMarc in [#45063]

  • [docs] continuous batching (#44896) by @stevhliu in [#44896]

  • Fix few issues in Qwen_3_Omni_Moe (#44848) by @Sai-Suraj-27 in [#44848]

  • Fix TypeError in rope validation when ignore_keys is a list (#45069) by @Fr0do in [#45069]

  • Remove unused TensorFlow env var (#45065) by @Sai-Suraj-27 in [#45065]

  • fix: add identity reverse_op to dequantize ops for save_pretrained (#44983) by @Hyungkeun-Park-Nota in [#44983]

  • Fix when RoPE params are in kwargs (#45049) by @zucchini-nlp in [#45049]

  • chore: update update_metdata.yml (#45054) by @hf-security-analysis[bot] in [#45054]

  • [FA] Fix BC support for a few versions + add deprecation cycle (#45061) by @vasqu in [#45061]

  • fix(testing): Fix Parakeet, Evolla, Pi0, and Phi-3 test failures on main CI (#45004) by @harshaljanjani in [#45004]

  • Allow advanced users to override model_type in AutoConfig.from_pretrained (#45058) by @hmellor in [#45058]

  • Fix failing SmolLM3IntegrationTest (#45048) by @Sai-Suraj-27 in [#45048]

  • chore: remove old extras (#45024) by @tarekziade in [#45024]

  • Embedding VLMs don't need a head (#45000) by @zucchini-nlp in [#45000]

  • Fix GraniteConfig type hints to accept int for multiplier fields (#45019) by @javierdejesusda in [#45019]

  • fix: preserve rotary_pct across save/load cycle in GPTNeoX configs (#44985) by @Krishnachaitanyakc in [#44985]_

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @ed22699

Internalise the NomicBERT model (#43067)

  • @tarekziade

Use doc-builder runnable example for GLM-ASR (#44277) refactoring: speedup static checks with disk cache (#44992) feature: added import complexity checker (#45013) refactor: added cache in check_repo (#45012) chore: remove old extras (#45024) chore: Fix mlinter cache location (#45052) refactor: speed up docstring checker (#45009)

  • @Krishnachaitanyakc

fix: correct type annotations across config classes for @strict validation (#45007) fix: preserve rotary_pct across save/load cycle in GPTNeoX configs (#44985)

  • @lashahub

Add Music Flamingo (#43538)

  • @Lidang-Jiang

[Bugfix] Remove incorrect torchvision requirement from PIL backend image processors (#45045)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

releaseupdate

Knowledge Map

Knowledge Map
TopicsEntitiesSource
v5.5.0releaseupdateHF Transfor…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 196 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Releases