Using synthetic training data to improve Flux finetunes

Replicate BlogSeptember 20, 20241 min read0 views

It's easy to fine-tune Flux, but sometimes you need to do a little more work to get the best results. This post covers techniques you can use to improve your fine-tuned Flux models.

Posted September 20, 2024 by

zeke

Info

Update (May 2025): We’ve released a faster version of the Flux trainer — try it here.

I know, I know. We keep blogging about Flux. But there’s a reason: It’s really good! People are making so much cool stuff with it, and its capabilities continue to expand as the open-source community experiments with it.

In this post I’ll cover some techniques you can use to generate synthetic training data to help improve the accuracy, diversity, and stylistic range of your fine-tuned Flux models.

Getting started

To use the techniques covered in this post, you should have an existing fine-tuned Flux model that needs a little improvement.

If you haven’t created your own fine-tuned Flux model yet, check out our guides to fine-tuning Flux on the web or fine-tuning Flux with an API, then come back to this post if you need tips to make it better.

What is synthetic data?

Synthetic data is artificially generated data that mimics real-world data. In the case of image generation models, synthetic data refers to images created by the model, rather than real photographs or human-generated artwork. Using synthetic data can help create more varied and comprehensive training datasets than using real-world images alone.

Tip 1: Generate training data from a single image

The consistent-character model is an image generator from the prolific and inimitable @fofr. It takes a single image of person as input and produces multiple images of them in a variety of poses, styles, and expressions. Using consistent-character is a great way to help jumpstart your Flux fine-tuning, especially if you don’t have a lot of training images to start.

The fofr/consistent-character model produces many images from a single input.

Here’s a quick example of how to use consistent-character with the Replicate JavaScript client to generate a batch of training images from a single image input:

Tip 2: Use outputs from your fine-tuned model as training data

Sometimes when you fine-tune Flux, your trained model doesn’t consistently produce the quality of images you want. Maybe one in ten of your outputs meets your expectations. Fortunately you can take those good outputs and use them as training data to train an improved version of your model.

The process works like this:

Create a new fine-tune with just a handful of images.
Run your model with an API to generate a large batch of images.
Comb through the generated images and choose the good ones.
Run a new training job using those outputs as training data.

To ease the process of generating lots of images from your model and downloading them to your local machine, you can use a tool like aimg. All you need to run aimg is a Replicate API token and a recent version of Node.js.

Here’s a command that will generate 50 images using the exact same prompt each time:

You can also get more variety in your outputs by using the --subject flag, which will auto-generate a unique prompt for each image:

Once you’ve gathered a selection of images that you like, zip them up:

Then kick off a new training job on the web or via the API.

Note: All Replicate models (including fine-tunes) are versioned, so you can use your existing model as the destination model when starting your second training job, and the training process will automatically create a new version of the model. Your first version will remain intact, and you’ll still be able to access it and use it to generate images.

Tip 3: Combine LoRAs to diversify your training data

You may find that your fine-tuned Flux model is only outputting images in a single style, like a realistic photograph. You may be trying to generate images that look like paintings or illustrations, but it will only output photorealistic images, no matter how many painting-related keywords you put in your prompt.

A little-known feature of Flux fine-tunes on Replicate is that you can combine multiple LoRA styles in a single output image. LoRA stands for “Low-Rank Adaptation”. I won’t go into technical detail about how LoRAs work here, but the important thing to know is that it’s become an industry term for “trained weights” in the context of fine-tuning image models. When you refer to “a LoRA”, you’re talking about a specific set of trained weights that get added to the base Flux model to constitute a “fine-tuned model”.

”ZIKI the man, illustrated MSMRB style”, created by combining the zeke/ziki-flux human face model with the jakedahn/flux-midsummer-blues illustration style model.

Combining LoRAs is a really fun way of generating unique images, but you can also use it as a technique to diversify your training data to create better versions of your own fine-tuned Flux models.

At a high level, the process works like this:

Create a fine-tuned model with whatever training data you have available.
Explore LoRA fine-tunes from the community and pick a few that you like.
Generate images with your fine-tuned model, combining it with the other LoRAs you selected.
Comb through the outputs and select the ones that meet your expectations.
Run a new training job using those outputs as training data.

To find LoRAs to combine with your model, check out the Flux fine-tunes on Replicate and Replicate LoRA fine-tunes on Hugging Face.

To generate images with combined LoRAs using the Replicate API, set the extra_lora and extra_lora_scale input parameters, and be sure to the use the trigger words from both models in your prompt.

Here’s an example of how to generate images with combined LoRAs using the Replicate JavaScript client:

The key things to keep in mind when combining LoRAs are:

Be sure to use the trigger words from both models in your prompt to ensure that the LoRA styles are applied correctly.
The extra_lora parameter should be set to the name of the LoRA you want to combine with your model. You can use the shorthand name of the model, like jakedahn/flux-midsummer-blues, or the full URL to a weights file.
The extra_lora_scale parameter should be set to a value between -1 and 2. The higher the value, the more pronounced the extra LoRA style will be.
Try balancing your multiple LoRAs by experimenting with their scales between 0.9 and 1.1

Have fun and iterate!

Hopefully these training tips will help you get the most out of your fine-tuned Flux model. The key to the fine-tuning process is experimentation and iteration. Try different techniques and see what works best for your use case.

Have fun and share your results with the community on X or Discord.

Original source

Replicate Blog

https://replicate.com/blog/using-synthetic-data-to-improve-flux-finetunes

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modeltrainingfinetune

ModelsRecent

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxQeVRmNi1ObGwyVWR3SURoakFvaDJ4bzBiaW9uRkZoVEp3WU96U05CeGd0LUF0SjhfMnRqajduN21hX1lXMVFHNmx5N0Z4Z1FmX0tpRHNqWVN0Wm9wQWFXeXgwRm9GTm1BRW1wSUR5WlptdF9tSGpWcktrb1NXMFRtMGRJaTNuYkk3ZFVTUF9nQ2ZHYUM0TWFaNDBiMG9NRFVGaFdHLUdiTkMxSldyaXBhZUI1V2wzc3BGZnlQVEgzTU1vMEoxcGtuOG9Zd0VkZW9zOXZXRWVKTGVIWUVEOEt5UVdFOUlWLUZ5ZFpYU3NqbUVUSVF3dXlIUkx0dl85cUM5cGVENS1jRS0wNGRkbTEyTXZUSmw1QTltdzR5ZlFnMV9XU3pueHF2TlJTZnhrSERmRmI5LWRtZFZyUzZXVnZVUDNWNzA1a3ctMEZ1THR1clRyT1Ywd2daTDlVS1RJdkxXdkNEdWtMbi1HRXVVRllqWVBEMFpHWXV4MzE1QXpqZnhfSFVEalkzUjVxd2dDWXltUlBhdWl6UXo5cVdxYU1OZ3JrNUNUcTBycjNBVFhFVHVoc1M0ZUFlNA?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> WSJ

Google News: LLM

1mabout 20 hours ago

ProductsLive

Top AI Agent Frameworks in 2026: A Production-Ready Comparison

We tested 8 AI agent frameworks in production across healthcare, logistics, and fintech. Here’s what actually works — and what breaks when real users show up. Six months ago, picking an AI agent framework meant choosing between LangGraph, CrewAI, and AutoGen. That was the entire conversation. Now every major AI lab ships its own agent SDK. OpenAI, Anthropic, and Google all launched agent development kits in 2026. Microsoft rebuilt AutoGen from scratch. LangGraph hit 126,000 GitHub stars. CrewAI raised funding and shipped enterprise features. The result: 120+ agentic AI tools across 11 categories, and every CTO we talk to asks the same question — which one should we actually build on? Here’s the problem with most framework comparisons: they benchmark toy examples. They test “build a researc

Towards AI

23m44 minutes ago

ProductsLive

Your AI Writes Code. Who Fixes the Build?

Every AI coding tool in 2026 can write code. Some of them write great code. But here's the question nobody asks during the demo: what happens when the build fails? Because the build will fail. It always does. <h2> The Invisible 40% </h2> When you watch a demo of an AI coding tool, you see the impressive part: the AI generates a full component, a complete function, an entire page. It looks magical. What you don't see is what happens next: <ul> <li>The import path is wrong because the AI didn't read the project's module structure</li> <li>There's a type mismatch because the API response shape changed last week</li> <li>A dependency is missing because the AI assumed it was already installed</li> <li>A CSS class doesn't exist because the AI used Tai

DEV Community

4m36 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 179 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsLive

I asked ChatGPT if AI can be empathetic: The answer surprised me - PharmaLive

<a href="https://news.google.com/rss/articles/CBMilgFBVV95cUxOUXFNejdyOHdTbm5SdzBNbTVGcTg1ZURkUENfcGtqTVA4dTJaemJQZGV3M1hGR1ZFUV9tMmJjbGZVaVB2czVwaTRrZWNFQkNrbGhiQkNRWjEwY2F4LUtLSi12Y1lMbDlEWERNY2RkakVaQ3NKMVZhWS1CZjN1UTRzQVpmR0FWNFB4YzlwTURkUGwtSF92MkE?oc=5" target="_blank">I asked ChatGPT if AI can be empathetic: The answer surprised me</a> PharmaLive

Google News: ChatGPT

1m22 minutes ago

Models

Silicon Valley Has Stopped Talking Politics—Except for This Google Executive - WSJ

<a href="https://news.google.com/rss/articles/CBMitwNBVV95cUxNcGZpNUtWdE1fd19RaWl6MzdnVWZBTEJ4MWE0X1hBMkZsT29Ec0lvZ0wyRmFhTE4zWTNMNC1RdFVocGlUN0tEcjBvaTFUMkVnTVF2T3pDSThRSVBtd05tQ1M3VlZFX190ZDdZbm5vWUE2Y1dTbUg5VnItZEJVOFBLOVRfbEEzcUdpZlcyU1k1QkhEbTZrQWVXX3pZR1E3OU9lWEJqSjBSejhDWTV6czM1RUVVWFhsc2NCajRNc053cFFERndCZWxxdnJzR2pYZktpLTBoVWhNZGNJQzNpRXlwLTFGV3hDYnh4MS1NRURDajZtOEp5ajZoWHZjT01tbVF6XzNEVXBEcmI0RTNCUkdnM1VuaFdpa0ZvcWhFSEpwUXhNa25id2E0RHY5V2NxaXlRTGc0ek9JRE9aLW1WT281allmUkF4T3N6OHBlb2Y4cUh4YkxSQ2txSVRaNDZBSGVPbDN3NzdIT0E1RVNOcjdNWVdnOEJlMUw2SWpHTi1hbTN3Qnp2emhEQXR2NkVUM3hpMmNseVNsQW5RbHRlQWNuY2tHZ1dKckJuQi0tN0hGdEhheGlPNE8w?oc=5" target="_blank">Silicon Valley Has Stopped Talking Politics—Except for This Google Executive</a> WSJ

Google News: DeepMind

1m3 days ago

ModelsLive

Shutterstock Launches Licensed Content App In Chatgpt, Bringing Commercial-Ready Assets Into Ai-Native Workflows - TradingView

<a href="https://news.google.com/rss/articles/CBMijgJBVV95cUxOc0ZvNHktZE5sRkFoY3Q0eWlVOGFiT05jbHdOZTFjUWI5Q0NWTjNCMFhfSXItWHpNOUoxQnlENnkxSTlrRGdTUDljS05FRk4yT0JXUExUYXExZ2duRXh2TmV3SzFSZXhDXzVtOVdzaEZNZ1MxaXozM1ZaNFJwa2RRdWhJRE5pS3dwN2k1dUsxanZMQmZBdFVWRGRPRm5RSEVLQVBwQXJab2M5eEg5QUZLUmxpeDdNZVdvejVaWDRGRU1nMHVJVG1Qc0RDY2hWRHNIbkpuN3ZEb0NSTkVqZFdPTlpRejNSYTZXUWQtaUcxSG4xeGV4bVZDME9UOHA0RTRPTF8tZXpEMlZ6XzExcmc?oc=5" target="_blank">Shutterstock Launches Licensed Content App In Chatgpt, Bringing Commercial-Ready Assets Into Ai-Native Workflows</a> TradingView

Google News: ChatGPT

1m27 minutes ago

ModelsFresh

Amazon, Cathie Wood Buy Into ChatGPT Parent OpenAI's Fund Raise Ahead of IPO - Barron's

<a href="https://news.google.com/rss/articles/CBMijANBVV95cUxNLTZRbXFEREFZcVlYUGFkLThyU2ROV2VNak93VnFGaThldzhGcTFJRkRGWnU2aXRQZE1XOGh2YVhxRlRqTnBjNU4wMld2QzBxM2UwME9ic3hiTlVhcUVpeDRSZjJhUkE2elc1UFM1X21BZjJ3SERwdDRoclY4am1DRGwyUjZQRlZyazRITzljUDFEMGlybkJ2MWhRNjY4aGYtN3J0RWt5Q0FrZnRRVlk3cnk4ZUMxUDFJS2FGVHE1emh4cGVybTROX0tBbXB1blcwSHNKNFVDMEFCMzZ3U29ZbkZMVklkY0hCOW5hYklocUttUVFhU1VQVUdmT3hjRzE3ZkNEdTJjdlBGOS1qY0pHek8yOTRLTDNvQmpZMFVOZGVGb1JBZ0diRllJa2V6dDVyMi1LdTB5S05SOUpmNnlJeERQNTcxbDBIQWZHcTlkakJISXZCdmFxM1puT3ExWFdEdUtzQnFvZURyQUNpdlhPbGd0MlhSbWpLOXFxQkt1ZWtwb3d3enNkanNRTFo?oc=5" target="_blank">Amazon, Cathie Wood Buy Into ChatGPT Parent OpenAI's Fund Raise Ahead of IPO</a> Barron's

Google News: OpenAI

1mabout 2 hours ago