As others have said, you can fine-tune any model with a pretty small data set of images and captions and make your generations not look like 'AI' or all look the same.
Here's one I made a while back trained on Sony HVS HD video demos from the 80s/90s -- https://civitai.com/models/896279/1990s-analog-hd-or-4k-sony...
(Disclaimer: I am the Krea cofounder and this is based on a small sample size of results I've seen).
First pic (blonde woman with eyes closed) has alt text that begins:
> Extreme close-up portrait of a black man’s face with his eyes closed
copypasta mistake or bad prompt adherence? haha.
(for others: https://civitai.com/models/890536/nasa-astrophotography-or-f...)
Sadly with SAI going effectively bankrupt things changed, their rushed 3.0 model was broken beyond repair and the later 3.5 just unfinished or something (the api version is remarkably better), gens full of errors and artifacts even though the good ones looked great. It turned out hard to finetune as well.
In the mean time flux got released, but that model can be fried (as in one concept trained in) but not finetuned (this krea flux is not based on the open weights flux). Add to that that as models got bigger training/finetuning now costs an arm and a leg, so here we are, a year after flux got released a good finetune is celebrated as the next new thing :)
> Model builders have been mostly focused on correctness, not aesthetics. Researchers have been overly focused on the extra fingers problem.
While that might be true for the foundational models - the author seems to be neglecting the tens of thousands of custom LoRAs to customize the look of an image.
> Users fight the “AI Look” with heavy prompting and even fine-tuning
IMHO it is significantly easier to fix an aesthetic issue than an adherence issue. You can take a poor quality image, use ESRGAN upscalers, img2img using it as a ControlNet, run it through a different model, add LoRAs, etc.
I have done some nominal tests with Krea but mostly around adherence. I'd be curious to know if they've reduced the omnipresent bokeh / shallow depth of field given that it is Flux based.
> While that might be true for the foundational models
Its possibly true [0] of the models from the big public general AI vendors (OpenAI, Google), its defintely not true of MJ (which, if it has an aesthetic bias to what the article describes as “the AI look” it is largely because that was a popular actively sought and prompted for look in early AI image gen to avoid the flatness bias of early models and MJ leaned very hard into biasing toward what was popular aesthetically in that and other areas as it developed. Heck, lots of SD finetunes actively sought to reproduce MJ aesthetics for a while.)
[0] but I doubt it, and I think they have also been actively targeting aesthetics as well as correctness, and the post even hints at at least part of how that reinforced the “AI look” — the focus on aesthetics meant more reliance on the LAION Aesthetics dataset to tune the models understanding of what looked good, transferring the biases of that dataset into models that were trying to focus on aesthetics.
You'll probably get a lot of replies around how this model is a just a fine-tune and a potential disregard for LoRAs, as if we didn't know about them. While the reality is that we have thousands of them running in our platform. Sadly there's simply so much a LoRA and a fine-tune can do before you run into issues that can't be solved until you apply more advanced techniques such as curated post-training runs (including reinforcement learning-based techniques such as Diffusion-PPO[1]), or even large-scale pre-training.
-
A funny consequence of this is that now it’s really hard to get models to intentionally generate disfigured hands (six fingers, missing middle finger).
Also, there's a lot of "samehand" and hand hiding in BFL and other models. Part of the reason I don't use any MaaS is how hard they were focusing on manufacturing superficial impressions over increasing fundamental understanding and direction following. Kontext is a nice deviation, but it was already achievable through captioning and model merges.
I did a 50% mix of flux-dev-krea and flux-dev and it is my new favorite base model.
Which is to say -- if one is in the business or activity of "making AI images go a certain way" a quick perusal of e.g. Civitai has about a million solutions to the "problem" of "all the AI art looks the same?"
Krea wrote a great post, trained the opinions in during post-training (not during LoRA), and I’ve been noticing larger labs doing similar things without discussing it (the default ChatGPT comic strip is one example). So I figured I’d write it up for a more general audience and ask if this is the direction we’ll go for qualitative tasks beyond imagery.
Plus, fine-tuning is called out in the post.
Not sure what you were expecting. That sounds like the model is avoiding what it was built to avoid?
This model is not new tech just a change in bias.
It’s doing what it says on the can.
The problem with AI images, in my opinion, is not the generated image (that can be better or worse) but the prompt and instructions given to the AI and their "defaults".
So many blog posts and social media updates have that horrible (again, to me) feel and look of overly plastic vibe, like a cartoon that has been burn... just like "needs more JPEG" but "needs more AI-vibe".
Personally - I think it looks considerably better than the GPT image.
1. The image just seems to be completely unrelated to the actual content of the article
2. The image looks like it came out of SD 1.5 with smeared text, blur, etc.
Releasing weights for FLUX.1 Krea - https://news.ycombinator.com/item?id=44745555 - July 2025 (107 comments)