Tiled Hacker news on React Router

Pico-Banana-400k

395 points - 10/26/2025

Source

vunderba
10/26/2025
From the paper
> The pipeline (bottom) shows how diverse OpenImages inputs are edited using Nano-Banana and quality-filtered by Gemini-2.5-Pro, with failed attempts automatically retried.
Pretty interesting. I run a fairly comprehensive image-comparison site for SOTA generative AI in text-to-image and editing. Managing it manually got pretty tiring, so a while back I put together a small program that takes a given starting prompt, a list of GenAI models, and a max number of retries which does something similar.
It generates and evaluates images using a separate multimodal AI, and then rewrites failed prompts automatically repeating up to a set limit.
It's not perfect (nine pointed star example in particular) - but often times the "recognition aspect of a multimodal model" is superior to its generative capabilities so you can run it in a sort of REPL until you get the desired outcome.
https://genai-showdown.specr.net/image-editing
ttul
10/26/2025
Image editing model training is fascinating. One method for training image editing models involves using a second model to apply the inverse of the change you want the model to learn. Typically, the task you’re asking the second model to perform is easy, whereas the inverse task is difficult.
For example, you might ask the second model to cover the person’s face with a black square; a VLM model notes that the person is a man with brown hair and round glasses. Then, during training, the resulting image is presented along with the prompt, “Remove the black square from the man’s face. He has brown hair and round glasses.”
The model now learns how to remove black squares and replace them with a man’s face with brown hair and round glasses.
Since the training data is easily synthesized using existing models, you can generate enormous amounts of it - often very cheaply. For specialized editing tasks, this technique is really powerful. Build your training set for your special purpose task, fine tune an existing image editing model such as Qwen Image Edit to produce a new checkpoint or LoRA (often a LoRA is more than good enough) and then you have a special purpose model to perform whatever narrow editing task you need it to perform on your image data.
sollewitt
10/26/2025
AI industry: please _please_ get it together with naming. There shouldn’t be this much overlap between this, a dataset, and a massive image model which was already given a garbage name to begin with.
Don’t get me started in how “agent” is a term of art that means absolutely nothing, encompassing everything from a plain old shell script to a full language model.
skissane
10/26/2025
The license is CC BY-NC-ND - I’m not sure who is going to be able to use it given the NC-ND part… especially given the potential uncertainty over what uses count as commercial and what counts as derivative works. OTOH, given the bulk of this dataset is AI outputs, its copyrightability is an open question.
TechSquidTV
10/26/2025
Can it be? Has Apple FINALLY joined the party? Very ironic they are using an open dataset from Google... and Gemini for prompts by Google.
I'm happy to see something from Apple but this seems so low-tech that it could be one of my own local ComfyUI workflows.
MeteorMarc
10/26/2025
Thought this was about Raspberry Pi because of the assocations with the banana pi and the pi pico.
daemonologist
10/26/2025
I confess that I don't quite get the point here - is it just that they've paid the inference costs for a dataset than can be used for distillation/other research?
zuInnp
10/26/2025
Maybe it is only me, but all the emojis in the readme look like AI wrote it and instantly make stop reading it ...
BarakWidawsky
10/26/2025
Looks like the dataset is distilled from Gemini nano-banana
Definitely very useful, but I’m so curious how the original datasets from these image editing models were created. I’m guessing a lot of it is synthetic data to construct scenes programmatically with layers
cubefox
10/26/2025
Any idea why they didn't use GPT-4o image generation?
ThrowawayTestr
10/26/2025
I love open datasets. The future of LLM is open source models.
djtriptych
10/26/2025
[flagged]
ccapitalK
10/26/2025
[flagged]
sebmellen
10/26/2025
[flagged]
10/26/2025
vednig
10/26/2025
Other Post: https://news.ycombinator.com/item?id=45708493

Pico-Banana-400k

vunderba

svantana

vunderba

MattRix

typpilol

vunderba

typpilol

vunderba

lukasb

vunderba

motbus3

dangoodmanUT

lukasb

scotty79

ttul

onlyrealcuzzo

jerf

ttul

sollewitt

3abiton

andai

fritzo

watersb

NBJack

skissane

niek_pas

qoez

poly2it

skissane

hsbauauvhabzb

littlestymaar

DrewADesign

TechSquidTV

echelon

TechSquidTV

ttul

kranke155

ttul

ThrowawayTestr

MeteorMarc

3form

daemonologist

peddling-brink

zuInnp

thinkingemote

exadeci

stefan_

ThrowawayTestr

huflungdung

BarakWidawsky

ttul

cubefox

whywhywhywhy

Alifatisk

pwython

exadeci

svantana

Jackson__

neom

ThrowawayTestr

djtriptych

ccapitalK

sebmellen

bckr

sebmellen

vednig

cjrd

gus_massa