Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessHow to Register a Globe SIM?Dev.to AIHow to Build a Manus AI Agent That Writes & Emails Lead Reports While You SleepMedium AIЯ сделал 50 видео за неделю - нейросеть справилась самаDev.to AIA look at how some teens use popular role-playing chatbots and, for parents, the high stakes task of understanding the impact of the possibly addictive products (New York Times)TechmemeCheaper/faster/easier makes for step changes (and that's why even current-level LLMs are transformative)LessWrongIntroduction to Computer Music [pdf]Hacker NewsAI Desktop 98 lets you chat with Claude, ChatGPT, and Gemini through a Windows 98-inspired interface - XDAGoogle News: ChatGPTHow to secure MCP tools on AWS for AI agents with authentication, authorization, and least privilegeDev.to AIOpen Source Project of the Day (Part 30): banana-slides - Native AI PPT Generation App Based on nano banana proDev.to AIStop Writing AI Prompts From Scratch: A Developer's System for Reusable Prompt TemplatesDev.to AII Tested Every 'Memory' Solution for AI Coding Assistants - Here's What Actually WorksDev.to AIThe Flat Subscription Problem: Why Agents Break AI PricingDev.to AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessHow to Register a Globe SIM?Dev.to AIHow to Build a Manus AI Agent That Writes & Emails Lead Reports While You SleepMedium AIЯ сделал 50 видео за неделю - нейросеть справилась самаDev.to AIA look at how some teens use popular role-playing chatbots and, for parents, the high stakes task of understanding the impact of the possibly addictive products (New York Times)TechmemeCheaper/faster/easier makes for step changes (and that's why even current-level LLMs are transformative)LessWrongIntroduction to Computer Music [pdf]Hacker NewsAI Desktop 98 lets you chat with Claude, ChatGPT, and Gemini through a Windows 98-inspired interface - XDAGoogle News: ChatGPTHow to secure MCP tools on AWS for AI agents with authentication, authorization, and least privilegeDev.to AIOpen Source Project of the Day (Part 30): banana-slides - Native AI PPT Generation App Based on nano banana proDev.to AIStop Writing AI Prompts From Scratch: A Developer's System for Reusable Prompt TemplatesDev.to AII Tested Every 'Memory' Solution for AI Coding Assistants - Here's What Actually WorksDev.to AIThe Flat Subscription Problem: Why Agents Break AI PricingDev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

🐶Safetensors audited as really safe and becoming the default

EleutherAI BlogMay 23, 20231 min read0 views
Source Quiz

Audit shows that safetensors is safe and ready to become the default Hugging Face, in close collaboration with EleutherAI and Stability AI, has ordered an external security audit of the safetensors library, the results of which allow all three organizations to move toward making the library the default format for saved models. The full results of the security audit, performed by Trail of Bits, can be found here: Report. The following blog post explains the origins of the library, why these audit results are important, and the next steps.

Audit shows that safetensors is safe and ready to become the default

Hugging Face, in close collaboration with EleutherAI and Stability AI, has ordered an external security audit of the safetensors library, the results of which allow all three organizations to move toward making the library the default format for saved models.

The full results of the security audit, performed by Trail of Bits, can be found here: Report.

The following blog post explains the origins of the library, why these audit results are important, and the next steps.

What is safetensors?

🐶Safetensors is a library for saving and loading tensors in the most common frameworks (including PyTorch, TensorFlow, JAX, PaddlePaddle, and NumPy).

For a more concrete explanation, we'll use PyTorch.

import torch from safetensors.torch import load_file, save_file

weights = {"embeddings": torch.zeros((10, 100))} save_file(weights, "model.safetensors") weights2 = load_file("model.safetensors")`

It also has a number of cool features compared to other formats, most notably that loading files is safe, as we'll see later.

When you're using transformers, if safetensors is installed, then those files will already be used preferentially in order to prevent issues, which means that

pip install safetensors

is likely to be the only thing needed to run safetensors files safely.

Going forward and thanks to the validation of the library, safetensors will now be installed in transformers by default. The next step is saving models in safetensors by default.

We are thrilled to see that the safetensors library is already seeing use in the ML ecosystem, including:

  • Civitai

  • Stable Diffusion Web UI

  • dfdx

  • LLaMA.cpp

Why create something new?

The creation of this library was driven by the fact that PyTorch uses pickle under the hood, which is inherently unsafe. (Sources: 1, 2, video, 3)

With pickle, it is possible to write a malicious file posing as a model that gives full control of a user's computer to an attacker without the user's knowledge, allowing the attacker to steal all their bitcoins 😓.

While this vulnerability in pickle is widely known in the computer security world (and is acknowledged in the PyTorch docs), it’s not common knowledge in the broader ML community.

Since the Hugging Face Hub is a platform where anyone can upload and share models, it is important to make efforts to prevent users from getting infected by malware.

We are also taking steps to make sure the existing PyTorch files are not malicious, but the best we can do is flag suspicious-looking files.

Of course, there are other file formats out there, but none seemed to meet the full set of ideal requirements our team identified.

In addition to being safe, safetensors allows lazy loading and generally faster loads (around 100x faster on CPU).

Lazy loading means loading only part of a tensor in an efficient manner. This particular feature enables arbitrary sharding with efficient inference libraries, such as text-generation-inference, to load LLMs (such as LLaMA, StarCoder, etc.) on various types of hardware with maximum efficiency.

Because it loads so fast and is framework agnostic, we can even use the format to load models from the same file in PyTorch or TensorFlow.

The security audit

Since safetensors main asset is providing safety guarantees, we wanted to make sure it actually delivered. That's why Hugging Face, EleutherAI, and Stability AI teamed up to get an external security audit to confirm it.

Important findings:

  • No critical security flaw leading to arbitrary code execution was found.

  • Some imprecisions in the spec format were detected and fixed.

  • Some missing validation allowed polyglot files, which was fixed.

  • Lots of improvements to the test suite were proposed and implemented.

In the name of openness and transparency, all companies agreed to make the report fully public.

Full report

One import thing to note is that the library is written in Rust. This adds an extra layer of security coming directly from the language itself.

While it is impossible to prove the absence of flaws, this is a major step in giving reassurance that safetensors is indeed safe to use.

Going forward

For Hugging Face, EleutherAI, and Stability AI, the master plan is to shift to using this format by default.

EleutherAI has added support for evaluating models stored as safetensors in their LM Evaluation Harness and is working on supporting the format in their GPT-NeoX distributed training library.

Within the transformers library we are doing the following:

  • Create safetensors.

  • Verify it works and can deliver on all promises (lazy load for LLMs, single file for all frameworks, faster loads).

  • Verify it's safe. (This is today's announcement.)

  • Make safetensors a core dependency. (This is already done or soon to come.)

  • Make safetensors the default saving format. This will happen in a few months when we have enough feedback to make sure it will cause as little disruption as possible and enough users already have the library to be able to load new models even on relatively old transformers versions.

As for safetensors itself, we're looking into adding more advanced features for LLM training, which has its own set of issues with current formats.

Finally, we plan to release a 1.0 in the near future, with the large user base of transformers providing the final testing step. The format and the lib have had very few modifications since their inception, which is a good sign of stability.

We're glad we can bring ML one step closer to being safe and efficient for all!

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelreport

Knowledge Map

Knowledge Map
TopicsEntitiesSource
🐶Safetenso…modelreportEleutherAI …

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 247 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models