Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessHow We Built an EdTech Platform That Scaled to 250K Daily UsersDEV CommunityRoguelike Devlog: Redesigning a Game UI With an AI 2D Game MakerDEV CommunityI spent days debugging a cron job that was "working fine"DEV CommunityLLM Agents Need a Nervous System, Not Just a BrainDEV CommunityThe 22,000 Token Tax: Why I Killed My MCP ServerDEV CommunityOpenSpec (Spec-Driven Development) Failed My Experiment — Instructions.md Was Simpler and FasterDEV CommunityI Asked AI to Do Agile Sprint Planning (GitHub Copilot Test)DEV Community🌪️ Proof of Work: The To-Do List of Infinite RegretDEV CommunityShaping the UAE’s Digital Destiny: Building Sovereignty, Trust, and Resilience in the Cyber EraEE TimesA new dating app, Sonder, has a deliberately annoying sign-up process (and it’s working)TechCrunchPromoting late-gameplay BG3 composition contracts in the TD2 SDL portDEV CommunityArtificial Intelligence Is Facing a Crisis of Control—and the Industry Knows It - Council on Foreign RelationsGoogle News: AI SafetyBlack Hat USADark ReadingBlack Hat AsiaAI BusinessHow We Built an EdTech Platform That Scaled to 250K Daily UsersDEV CommunityRoguelike Devlog: Redesigning a Game UI With an AI 2D Game MakerDEV CommunityI spent days debugging a cron job that was "working fine"DEV CommunityLLM Agents Need a Nervous System, Not Just a BrainDEV CommunityThe 22,000 Token Tax: Why I Killed My MCP ServerDEV CommunityOpenSpec (Spec-Driven Development) Failed My Experiment — Instructions.md Was Simpler and FasterDEV CommunityI Asked AI to Do Agile Sprint Planning (GitHub Copilot Test)DEV Community🌪️ Proof of Work: The To-Do List of Infinite RegretDEV CommunityShaping the UAE’s Digital Destiny: Building Sovereignty, Trust, and Resilience in the Cyber EraEE TimesA new dating app, Sonder, has a deliberately annoying sign-up process (and it’s working)TechCrunchPromoting late-gameplay BG3 composition contracts in the TD2 SDL portDEV CommunityArtificial Intelligence Is Facing a Crisis of Control—and the Industry Knows It - Council on Foreign RelationsGoogle News: AI Safety

FLUX fine-tunes are now fast

Replicate BlogNovember 26, 20241 min read0 views
Source Quiz

We've made running fine-tunes on Replicate much faster, and the optimizations are open-source.

Posted November 26, 2024 by

  • bfirsh

You can fine-tune FLUX on Replicate with your own data. We’ve made running fine-tunes on Replicate much faster, and the optimizations are open-source.

This builds upon our work from last month, where we made the FLUX base models much faster.

Running a fine-tune is now the same speed as the base model:

  • FLUX.1 [schnell] at 512x512 and 4 steps: 0.6 seconds (P50)

  • FLUX.1 [dev] at 1024x1024 and 28 steps: 2.8 seconds (P50)

In addition, the first time you run a fine-tune, it’ll take a bit of time to load the model. That’s usually about 2.5 seconds. Once it’s been loaded, we will attempt to route your requests to an instance that already has it loaded, and it will run as fast as the base model.

To enable all optimizations, pass go_fast=true to your prediction. If you omit the go_fast option, it will still be twice as fast as it was before, with no effect on output quality.

All models will get these optimizations automatically, both existing and future.

Load LoRAs from other places

If you’re not using Replicate to fine-tune models, we’ve also added support to load LoRAs from Hugging Face, Civitai, and arbitrary HTTP URLs.

Just pass a Hugging Face, Civitai, or HTTP URL to the lora_weights input in these new LoRA versions of FLUX:

  • black-forest-labs/flux-dev-lora

  • black-forest-labs/flux-schnell-lora

How did we do it?

Most of the models on Replicate are contributed by our community, but we maintain the FLUX models in collaboration with Black Forest Labs.

We optimized the base models by using Alex Redden’s flux-fp8-api as a starting point, optimized it with torch.compile and then used fast CuDNN attention kernels in the nightly Torch builds. For more details, take a look at our blog post about optimizing the base models.

Fine-tunes on Replicate are represented as LoRAs. We quantize the LoRA as fp8, then merge the weights into the base model. We also automatically increase the lora_scale input by 1.5x when go_fast=true, because we’ve found that produces better output. You might want to play around with this too.

The quantization in flux-fp8-api slightly changes the output, but we have found it has little impact on the quality.

We want to be open with you about how we’re optimizing the models. It’s notoriously hard to compare output between models and providers, and it’s often unclear whether providers are doing things that impact the quality of the model. We’re just going to tell you how we did it and let you disable any optimizations.

Open-source should be fast too

Open-source models are often slow out of the box. Model providers then optimize these models to make them fast and release them behind proprietary APIs, without contributing the improvements back to the community.

We want to change that. We think open-source should be fast too.

We’re open-sourcing all the improvements we make to FLUX. Read more on our blog post about making the base models fast.

It’s going to get faster

This makes running fine-tuned models faster, but there is still work to be done to make the training process fast. Some major improvements to that are coming up next.

There are also new techniques coming out all the time to make models faster, and by collaborating with the community you can be sure they’re going to be on Replicate as fast as possible. Stay tuned.

Follow us on X to keep up to speed.

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

open-source

Knowledge Map

Knowledge Map
TopicsEntitiesSource
FLUX fine-t…open-sourceReplicate B…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 199 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Releases