\

Pico-Banana-400k

261 points - today at 2:01 AM

Source
  • vunderba

    today at 2:25 AM

    From the paper

    > The pipeline (bottom) shows how diverse OpenImages inputs are edited using Nano-Banana and quality-filtered by Gemini-2.5-Pro, with failed attempts automatically retried.

    Pretty interesting. I run a fairly comprehensive image-comparison site for SOTA generative AI in text-to-image and editing. Managing it manually got pretty tiring, so a while back I put together a small program that takes a given starting prompt, a list of GenAI models, and a max number of retries which does something similar.

    It generates and evaluates images using a separate multimodal AI, and then rewrites failed prompts automatically repeating up to a set limit.

    It's not perfect (nine pointed star example in particular) - but often times the "recognition aspect of a multimodal model" is superior to its generative capabilities so you can run it in a sort of REPL until you get the desired outcome.

    https://genai-showdown.specr.net/image-editing

      • svantana

        today at 12:28 PM

        That's a great website! Feature request: a button to toggle all the sliders left or right at the same time - would make it easier to glance the results without lots of finicky mouse moves.

        • lukasb

          today at 3:42 AM

          What do you use for evaluation? gemini-2.5-pro is at the top of MMLU and has been best for me but always looking for better.

            • vunderba

              today at 5:17 AM

              Recently I've found myself getting the evaluation simultaneously from to OpenAI gpt-5, Gemini 2.5 Pro, and Qwen3 VL to give it a kind of "voting system". Purely anecdotal but I do find that Gemini is the most consistent of the three.

                • motbus3

                  today at 7:48 AM

                  I am running similar experiment but so far, changing the seed of openai seems to give similar results. Which if that confirms, is concerning to me on how sensitive it could be

                  • lukasb

                    today at 5:35 AM

                    Interesting, I'll give voting a shot, thanks.

            • typpilol

              today at 3:25 AM

              I love your site I stumble across it once a month it seems.

              Or there's another very similar site. But I'm pretty sure it's yours

                • vunderba

                  today at 5:21 AM

                  Thanks! It's probably the same site. It used to only be a showdown of text-to-image models (Flux, Imagen, Midjourney, etc), but once there was a decent number of image-to-image models (Kontext, Seedream, Nano-Banana) I added a nav bar at the top so I could do similar comparisons for image editing.

                    • typpilol

                      today at 6:45 AM

                      Yes that was exactly it.

                      How often do you update it? It seems like something new every time I check. Or I forget everything..

          • skissane

            today at 4:59 AM

            The license is CC BY-NC-ND - I’m not sure who is going to be able to use it given the NC-ND part… especially given the potential uncertainty over what uses count as commercial and what counts as derivative works. OTOH, given the bulk of this dataset is AI outputs, its copyrightability is an open question.

              • qoez

                today at 11:49 AM

                Output from a generative AI model has already been deemed non copyrightable and the license can't really overwrite that

                • niek_pas

                  today at 9:11 AM

                  > CC-BY-NC-ND or Creative Commons Attribution NonCommercial NoDerivs, is the most restrictive license offered by Creative Commons. With this license, the user (while attributing the original creator) can only share the work but not change it in any way or ever use it commercially.

                  • hsbauauvhabzb

                    today at 9:06 AM

                    I find caring about a licence for an LLM highly ironic.

                      • littlestymaar

                        today at 9:34 AM

                        It's not even an LLM, it's a dataset.

                        And clearly, if training on copyrighted material is fair use as every LLM makers claim, then this license has literally no weight.

                        Also, NAL but IIRC an automatically generated dataset isn't copyrightable in the first place.

                        • DrewADesign

                          today at 10:55 AM

                          Honor among thieves?

                  • TechSquidTV

                    today at 4:19 AM

                    Can it be? Has Apple FINALLY joined the party? Very ironic they are using an open dataset from Google... and Gemini for prompts by Google.

                    I'm happy to see something from Apple but this seems so low-tech that it could be one of my own local ComfyUI workflows.

                      • echelon

                        today at 6:42 AM

                        They're distilling Nano Banana with a Google dataset, letting anyone more easily build and test their own systems. It's kind of funny how easy this is to do.

                        "You wouldn't steal a car," but anyone can distill an expensive, fully trained model in order to build their own.

                        This is going to be one of the most important categories of image model. It's good that we have more than Google and the Chinese (ByteDance, et al) with competent editing models. I don't think Flux Kontext is keeping up.

                        It'd be really nice if we had a Nano Banana-calibur model as open source.

                          • kranke155

                            today at 9:58 AM

                            Flux Kontext is not keeping up. Even then Flux has become only partially open source. They keep the more advanced models API only.

                    • daemonologist

                      today at 4:13 AM

                      I confess that I don't quite get the point here - is it just that they've paid the inference costs for a dataset than can be used for distillation/other research?

                        • peddling-brink

                          today at 4:36 AM

                          Essentially yes, it’s a data set that can help train or fine tune another model or similar research. From the site:

                          > Pico-Banana-400K serves as a versatile resource for advancing controllable and instruction-aware image editing. Beyond single-step editing, the dataset enables multi-turn, conversational editing and reward-based training paradigms.

                      • BarakWidawsky

                        today at 5:26 AM

                        Looks like the dataset is distilled from Gemini nano-banana

                        Definitely very useful, but I’m so curious how the original datasets from these image editing models were created. I’m guessing a lot of it is synthetic data to construct scenes programmatically with layers

                        • zuInnp

                          today at 8:11 AM

                          Maybe it is only me, but all the emojis in the readme look like AI wrote it and instantly make stop reading it ...

                            • thinkingemote

                              today at 8:30 AM

                              Another glaring giveaway is the over use of numbered lists and bullet point lists.

                              Personally it makes me less likely to read it but the content might be useful. I have some general tech interest but am not overwhelmingly interested in the subject. Sometimes good things crop up on HN too.

                              Now, if an author was writing for an audience with the intention to attract the interest of people who were not enthusiasts to become enthusiasts of their product they would create something readable and attractive. The LLM hasn't here.

                              Together, this leads me to think that the readme is not for me but is just for dedicated enthusiasts.

                              • huflungdung

                                today at 9:13 AM

                                [dead]

                            • cubefox

                              today at 9:23 AM

                              Any idea why they didn't use GPT-4o image generation?

                                • svantana

                                  today at 12:23 PM

                                  Gemini is better. See results of double-blind evaluation here:

                                  https://lmarena.ai/leaderboard/image-edit

                                  • Jackson__

                                    today at 12:18 PM

                                    Because GPT-4o is too orange (literally).

                                    • Alifatisk

                                      today at 9:50 AM

                                      I think it's because Geminis nano banano is better than 4o imagegen at creating and editing images from instructions

                                  • djtriptych

                                    today at 3:55 AM

                                    [flagged]

                                    • sebmellen

                                      today at 5:44 AM

                                      [flagged]

                                    • today at 4:42 AM

                                      • vednig

                                        today at 2:30 AM

                                        Other Post: https://news.ycombinator.com/item?id=45708493

                                          • cjrd

                                            today at 3:40 AM

                                            Eh

                                              • gus_massa

                                                today at 9:27 AM

                                                It looks like a post about the presentation in the conference. No discussion. Sometimes the first post about a topic doesn't geht traction but a layer post gets more popular.

                                        • ccapitalK

                                          today at 8:18 AM

                                          Ok, but why did they call the model Micro-Penis-400k?