FLUX fine-tunes are now fast
We've made running fine-tunes on Replicate much faster, and the optimizations are open-source.
Posted November 26, 2024 by
- bfirsh
You can fine-tune FLUX on Replicate with your own data. We’ve made running fine-tunes on Replicate much faster, and the optimizations are open-source.
This builds upon our work from last month, where we made the FLUX base models much faster.
Running a fine-tune is now the same speed as the base model:
-
FLUX.1 [schnell] at 512x512 and 4 steps: 0.6 seconds (P50)
-
FLUX.1 [dev] at 1024x1024 and 28 steps: 2.8 seconds (P50)
In addition, the first time you run a fine-tune, it’ll take a bit of time to load the model. That’s usually about 2.5 seconds. Once it’s been loaded, we will attempt to route your requests to an instance that already has it loaded, and it will run as fast as the base model.
To enable all optimizations, pass go_fast=true to your prediction. If you omit the go_fast option, it will still be twice as fast as it was before, with no effect on output quality.
All models will get these optimizations automatically, both existing and future.
Load LoRAs from other places
If you’re not using Replicate to fine-tune models, we’ve also added support to load LoRAs from Hugging Face, Civitai, and arbitrary HTTP URLs.
Just pass a Hugging Face, Civitai, or HTTP URL to the lora_weights input in these new LoRA versions of FLUX:
-
black-forest-labs/flux-dev-lora
-
black-forest-labs/flux-schnell-lora
How did we do it?
Most of the models on Replicate are contributed by our community, but we maintain the FLUX models in collaboration with Black Forest Labs.
We optimized the base models by using Alex Redden’s flux-fp8-api as a starting point, optimized it with torch.compile and then used fast CuDNN attention kernels in the nightly Torch builds. For more details, take a look at our blog post about optimizing the base models.
Fine-tunes on Replicate are represented as LoRAs. We quantize the LoRA as fp8, then merge the weights into the base model. We also automatically increase the lora_scale input by 1.5x when go_fast=true, because we’ve found that produces better output. You might want to play around with this too.
The quantization in flux-fp8-api slightly changes the output, but we have found it has little impact on the quality.
We want to be open with you about how we’re optimizing the models. It’s notoriously hard to compare output between models and providers, and it’s often unclear whether providers are doing things that impact the quality of the model. We’re just going to tell you how we did it and let you disable any optimizations.
Open-source should be fast too
Open-source models are often slow out of the box. Model providers then optimize these models to make them fast and release them behind proprietary APIs, without contributing the improvements back to the community.
We want to change that. We think open-source should be fast too.
We’re open-sourcing all the improvements we make to FLUX. Read more on our blog post about making the base models fast.
It’s going to get faster
This makes running fine-tuned models faster, but there is still work to be done to make the training process fast. Some major improvements to that are coming up next.
There are also new techniques coming out all the time to make models faster, and by collaborating with the community you can be sure they’re going to be on Replicate as fast as possible. Stay tuned.
Follow us on X to keep up to speed.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
open-sourceThe 22,000 Token Tax: Why I Killed My MCP Server
<p>I was at a company workshop, arguing with beginners about token costs.</p> <p>They wanted to save money. Reasonable instinct. They were spending maybe €25 a week on API calls and wanted to cut it to €20. I pushed back hard: "You're at the learning stage. Spend <em>more</em>, not less. Explore. Break things. Create costs.<br> Because while you're saving €5, I'm spending €600 a week — and I'll gladly spend €20 more if it means finishing a ticket in one session instead of two."</p> <p>Then I told them the one scenario where token consumption actually matters: when you need to prolong a session. Not to save money — to preserve context. Because when your session compacts or resets, you lose everything the model was holding in its head. And in the early days of Claude Code, there was no auto-

The Hidden Cost of Copy-Pasting Code Into ChatGPT
<p>AI coding tools promise faster development. The <a href="https://metr.org/blog/2025-07-10-early-2025-ai-developer-study/" rel="noopener noreferrer">METR study</a> found the opposite: experienced developers were 19% slower on complex tasks when using AI, even though they perceived themselves as 20% faster. The biggest contributor wasn't bad code generation. It was the workflow around it.</p> <p>Every time you alt-tab from your editor to a chat window, paste a function, explain what it does, describe the bug you're seeing, read the response, mentally translate it back to your codebase, switch back to your editor, and apply the changes, you're paying a productivity tax that compounds across a day of work. <a href="https://www.microsoft.com/en-us/research/uploads/prod/2022/07/Disrupted_and_

Implementing Zero Trust Architecture for Unmanaged IoT at the Network Edge
<h2> Why Unmanaged IoT Is the Weakest Link in Your Network </h2> <p>The proliferation of Internet of Things (IoT) devices across enterprise environments has created a security paradox. Organizations deploy thousands of connected devices—IP cameras, building automation controllers, medical equipment, industrial sensors, point-of-sale terminals—to drive operational efficiency. Yet the vast majority of these devices are <strong>unmanaged</strong>: they cannot run endpoint agents, accept security patches on schedule, or participate in traditional identity frameworks. According to industry estimates, over 75% of IoT devices in production environments operate without any form of endpoint security.</p> <p>This creates a massive blind spot. Traditional perimeter-based security assumes that everyth
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Releases
Roguelike Devlog: Redesigning a Game UI With an AI 2D Game Maker
<p>Sector Scavengers is a spacefaring extraction roguelike where each run feeds a larger civilization-building meta game. This week was all about solving a UI problem that kept getting worse the longer I ignored it: one hub trying to do too much.</p> <p>What I learned quickly is that running both game modes through a single central hub was making both of them worse. Here is how I used Makko to work through it.</p> <h2> When One Screen Tries to Do Everything </h2> <p>My meta progression systems — crew advancement, station building, hardware research, void powers, and card unlocks — were all living in the same HUD as the controls for individual Expedition runs. On paper it sounded efficient. In practice it created a serious information architecture problem.</p> <p>The deeper I got into it, t

Shaping the UAE’s Digital Destiny: Building Sovereignty, Trust, and Resilience in the Cyber Era
Q. With the increasing reliance on digital infrastructure, how is the UAE addressing digital sovereignty to protect its critical assets and data from external threats? Lt. Colonel Saeed M. Al Shebli: Digital sovereignty, in my view, is no longer a technical concept; it’s a cornerstone of national independence and strategic stability. The UAE has been remarkably […] The post Shaping the UAE’s Digital Destiny: Building Sovereignty, Trust, and Resilience in the Cyber Era appeared first on EE Times . ]]>
Google's Veo 3.1 Lite Cuts API Costs in Half as OpenAI's Sora Exits the Market - Decrypt
<a href="https://news.google.com/rss/articles/CBMigwFBVV95cUxNTUZwMV9zbEhmbXlJcUNDdDZEckRyZTZBV3RLWnQ4Yk1pd3ZuT2pqNkhsQ3RVQTdsZWNnZUlFaXhnQUFlbGZPbVJWMnpVUHRIdE45UXlCZW5kZzdiaEljSVlPcXlPeVM4M25UbVp3c2RtdHpzUmVydmR3eWhsQ3Vfa19nRdIBiwFBVV95cUxPcTZkSE9ZSzFyUXR5bWdvc2VCXzlEVVBxemRTb3Q5ZDVJTlkyMW9qdDdPRnVyMkgzZmpnbmpaXzBLbE5KX2g0N216aWxqdmpjYU5RMm5rQ21wN1NxVllpMnNZU2szVFFKNlp1c0tZQ0N2WGVhQ0dqczJ4azI1Vmd3YzVvem1jRkRxYlRz?oc=5" target="_blank">Google's Veo 3.1 Lite Cuts API Costs in Half as OpenAI's Sora Exits the Market</a> <font color="#6f6f6f">Decrypt</font>


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!