However, models are now getting to the point where we are starting to learn the bitter lesson[0] even with stuff like color-palette generation. Nano Banana 2 [gemini-3.1-flash-image-preview] especially is adept at performing arbitrary operations on images. Before then, you would have to use a model such as Gemini Flash to perform segmentation[1] and then post-analyze those segments.
Here's a prompt I used with Nano Banana 2 in AI Studio
> Derive a coherent, designer's color palette from this image alone.
> Provide 5 distinct HEX color codes as your response.
[Attachment == the picture of the car, first in the author's article] [Settings: Output .. images & text; Thinking level .. minimal]
Response:
> I have extracted five distinct hex color codes directly from the key elements in this image, representing the colorful facade and the vintage car:
> #FF96C5 (The main pink wall)
> #38C6F1 (The light blue car)
> #AEF6A5 (The green wall)
> #E51988 (The dark pink trim and railing)
> #5F432B (The dark wood of the door and windows)
And they all pretty-much check out. Not hyper-accurate, but really not far off anymore. I didn't even have to try!
[0] - https://en.wikipedia.org/wiki/Bitter_lesson [1] - https://ai.google.dev/gemini-api/docs/image-understanding#se...
If the author of the blog post ever comes across this thread/ comment, bravo and I hope you feel pride in your work and I’d go so far to say discovery.
For the fourth iteration (guarding against phantom blue from shadow pixels), I wonder if it may help to also take into account how close the pixels in each cluster actually cluster together in the actual photo. (None of the heuristics used here seem to be interested in the position of the pixels at all, only in their values - as-is, it seems one could sort the photo's pixels before running the program and get the same result.) Actual objects usually form connected areas, whereas at least in the fruit image, the phantom shadows are spread across the entire photo in largely disconnected chunks.
We just threw Cursor/Claude at the images and it dug out the colours we wanted.
Have you ever tried allrgb.com? The idea is to use every 24-bit RGB triplet once and only once. Many people naturally choose 4096x4096 as the final image size.