\

AI World Clocks

1132 points - yesterday at 6:35 PM


"Every minute, a new clock is rendered by nine different AI models."

Source
  • lanewinfield

    yesterday at 7:59 PM

    hi, I made this. thank you for posting.

    I love clocks and I love finding the edges of what any given technology is capable of.

    I've watched this for many hours and Kimi frequently gets the most accurate clock but also the least variation and is most boring. Qwen is often times the most insane and makes me laugh. Which one is "better?"

      • jdietrich

        today at 2:20 AM

        Clock drawing is widely used as a test for assessing dementia. Sometimes the LLMs fail in ways that are fairly predictable if you're familiar with CSS and typical shortcomings of LLMs, but sometimes they fail in ways that are less obvious from a technical perspective but are exactly the same failure modes as cognitively-impaired humans.

        I think you might have stumbled upon something surprisingly profound.

        https://www.psychdb.com/cognitive-testing/clock-drawing-test

          • overfeed

            today at 5:49 AM

            > Clock drawing is widely used as a test for assessing dementia

            Interestingly, clocks are also an easy tell for when you're dreaming, if you're a lucid dreamer; they never work normally in dreams.

              • danw1979

                today at 8:02 AM

                For me it’s phones… specifically dialling a number manually. No matter how carefully I dial, the number on the screen is rarely correct.

                  • amelius

                    today at 10:55 AM

                    Whenever I dial a number while in a dream, the person I'm trying to call always turns out to be right next to me.

                      • today at 12:03 PM

                    • allarm

                      today at 10:32 AM

                      It seems that I’ve been stuck in a lucid dream for a couple of decades, no matter how carefully write text on a phone keyboard it never comes out as intended.

                  • ghurtado

                    today at 7:06 AM

                    In lucid dreams there's a whole category of things like this: reading a paragraph of text, looking at a clock (digital or analog), or working any kind of technology more complex than a calculator.

                    For me personally, even light switches have been a huge tell in the past, so basically almost anything electrical.

                    I've always held the utterly unscientific position that this is because the brain only has enough GPU cycles to show you an approximation of what the dream world looks like, but to actually run a whole simulation behind the scenes would require more FLOPs than it has available. After all, the brain also needs to run the "player" threads: It's already super busy.

                    Stretching the analogy past the point of absurdity, this is a bit like modern video game optimizations: the mountains in the distance are just a painting on a surface, and the remote on that couch is just a messy blur of pixels when you look at it up close.

                    So the dreaming brain is like a very clever video game developer, I guess.

                      • tablatom

                        today at 7:20 AM

                        Wait, lucid dreamers need tells to know where they are?!?

                          • lordnacho

                            today at 2:00 PM

                            Didn't you ever watch Inception? You have to carry around a little spinning top to test which level of VM you're inside of.

                            • Kiro

                              today at 7:24 AM

                              Yes, that's how you enter the lucid state. You find ways to tell that you're dreaming and condition yourself to check for those while awake. Eventually you will do it inside a dream and realize that you're dreaming.

                              • Kiboneu

                                today at 8:05 AM

                                Yeah. It’s very common to notice anomalies inside of a dream. But the anomalies weave into the dream and feel normal. You don’t have much agency to enter a lucid state from a pre-lucid dream.

                                So the idea is to develop habits called “reality checks” when you are awake. You look for the broken clock kind of anomalies that the grandparent comment mentioned. You have to be open to the possibility of dreaming, which is hard to do.

                                Consider this difficulty. Are you dreaming?

                                How much time did it take to think “no”? Or did you even take this question seriously? Maybe because you are reading a hn comment about lucid dreams, that question is interpreted as an example instead of a genuine question worth investigating, right? That’s the difficulty. Try it again.

                                The key is that the habit you’re developing isn’t just the check itself — it’s the thinking that you have during the check, which should lead you to investigate.

                                You do these checks frequently enough you end up doing it in a dream. Boom.

                                There’s also an aspect of identifying recurring patterns during prelucidity. That’s why it helps to keep a dream journal for your non-lucid dreams.

                                There are other methods too.

                                • david-gpu

                                  today at 10:16 AM

                                  Plenty of folks out there know when they are dreaming just like they know when they are awake. It varies from person to person.

                                    • DuperPower

                                      today at 12:04 PM

                                      be careful as adding consciousness to a dream means CPU cycles so you wake Up more tired, its cool for kids and teens but grown adults shouldnt explore this to avoid bad rest

                                        • travisjungroth

                                          today at 1:38 PM

                                          That’s a caution to getting addicted to it, but not never doing it. I’ve had powerful experiences in lucid dreaming that I wouldn’t trade for a little more rest. I was already in a retreat where I was basically resting all the time.

                      • xrisk

                        today at 3:49 AM

                        Maybe explainable via the fact that these tests are part of the LLM training set?

                        • jorgesborges

                          today at 4:24 AM

                          Conceptual deficit is a great failure mode description. The inability to retrieve "meaning" about the clock -- having some understanding about its shape and function but not its intent to convey time to us -- is familiar with a lot of bad LLM output.

                          • ACCount37

                            today at 8:48 AM

                            LLMs don't do this because they have "people with dementia draw clocks that way" in their data. They do it because they're similar enough to human minds in function that they often fail in similar ways.

                            An amusing pattern that dates back to "1kg of steel is heavier of course" in GPT-3.5.

                              • kaffekaka

                                today at 9:46 AM

                                How do you know this?

                                Obviously, humans failing in these ways ARE in the training set. So it should definitely affect LLM output.

                                  • ACCount37

                                    today at 9:59 AM

                                    First: generalization. The failure modes extend to unseen tasks. That specific way to fail at "1kg of steel" sure was in the training data, but novel closed set logic puzzles couldn't have been. They display similar failures. The same "vibe-based reasoning" process of "steel has heavy vibes, feather has light vibes, thus, steel is heavier" produces other similar failures.

                                    Second: the failures go away with capability (raw scale, reasoning training, test-time compute), on seen and unseen tasks both. Which is a strong hint that the model was truly failing, rather than being capable of doing a task but choosing to faithfully imitate a human failure instead.

                                    I don't think the influence of human failures in the training data on the LLMs is nil, but it's not just a surface-level failure repetition behavior.

                            • TheJoeMan

                              today at 2:36 AM

                              Figure 6 with the square clock would be a cool modern art piece.

                          • bspammer

                            yesterday at 11:18 PM

                            If you're keeping all the generated clocks in a database, I'd love to see a Facemash style spin-off website where users pick the best clock between two options, with a leaderboard. I want to know what the best clock Qwen ever made was!

                              • abixb

                                today at 1:17 AM

                                We might be on to creating a new crowd-ranked LLM benchmark here.

                                  • addandsubtract

                                    today at 2:15 AM

                                    A pelican wearing a working watch

                                      • danw1979

                                        today at 8:03 AM

                                        Using it to time bicycle race ?

                                • nightpool

                                  today at 1:03 AM

                                  Yes! Please do this

                                  • susu1111

                                    today at 3:28 AM

                                    [dead]

                                • smusamashah

                                  today at 1:28 AM

                                  Please make it show last 5 (or some other number) of clocks for each model. It will be nice to see the deviation and variety for each model at a glance.

                                  • charliewallace

                                    today at 2:55 AM

                                    Very cool! I also love clocks, especially weird ones, and recently put up this 3D Moebius Strip clock, hope you like it: https://www.mobiusclock.com

                                    • AnonHP

                                      today at 3:47 AM

                                      Could you please change and adjust the positions of the titles (like GPT 5)? On Firefox Focus on iOS, the spacing is inconsistent (seems like it moves due to the space taken by the clock). After one or two of them, I had to scroll all the way down to the bottom and come back up to understand which title is linked to which clock.

                                      • chemotaxis

                                        yesterday at 11:54 PM

                                        This is honestly the best thing I've seen on HN this month. It's stupid, enlightening... funny and profound and the same time. I have a strong temptation to pick some of these designs and build them in real life.

                                        I applaud you for spending money to get it done.

                                        • anigbrowl

                                          yesterday at 8:10 PM

                                          I really like this. The broken ones are sometimes just failures, but sometimes provide intriguing new design ideas.

                                            • jdiff

                                              yesterday at 10:54 PM

                                              This same principle is why my favorite image generation model is the earlier models from 2019-2020 where they could only reliably generate soup. It's like Rorschach tests, it's not about what's there, it's about what you see in them. I don't want a bot to make art for me, sometimes I just want some shroom-induced inspirational smears.

                                                • nemomarx

                                                  today at 12:40 AM

                                                  I really miss that deepdream aesthetic with the dogs eyes popping up everywhere.

                                          • ks2048

                                            yesterday at 11:48 PM

                                            Nice job! Maybe let users click an example to see the raw source (LLM output)

                                            • brianjking

                                              today at 3:48 AM

                                              This is an awesome benchmark. Officially one of my favorites now. Thank you for making this.

                                              • csours

                                                yesterday at 9:41 PM

                                                LOVE IT!

                                                It would be really cool if I could zoom out and have everything scale properly!

                                                • Fabricio20

                                                  yesterday at 10:09 PM

                                                  Why is this different per user? I sent this to a few friends and they all see different things from what i'm seeing, for the same time..?

                                                    • samtheprogram

                                                      yesterday at 10:31 PM

                                                      It regenerates on page load. I find that pretty useful.

                                                      Grok 4 and Kimi nailed it the first time for me, then only Kimi on the second pass.

                                                        • malfist

                                                          today at 2:55 AM

                                                          Not on page load, it regenerates every minute. There's a little hovering question mark in the top right that explains things, including the prompt to the models.

                                                  • yesterday at 8:52 PM

                                                    • hakcermani

                                                      today at 12:56 AM

                                                      .. would you mind sharing the prompt .. in a gist perhaps .

                                                        • ceroxylon

                                                          today at 1:09 AM

                                                          They have it available on the site under the (?) button:

                                                          "Create HTML/CSS of an analog clock showing ${time}. Include numbers (or numerals) if you wish, and have a CSS animated second hand. Make it responsive and use a white background. Return ONLY the HTML/CSS code with no markdown formatting."

                                                  • otterley

                                                    yesterday at 7:57 PM

                                                    Watching this over the past few minutes, it looks like Kimi K2 generates the best clock face most consistently. I'd never heard of that model before today!

                                                    Qwen 2.5's clocks, on the other hand, look like they never make it out of the womb.

                                                      • wowczarek

                                                        today at 2:00 PM

                                                        Interestingly, either I'm _hallucinating_ this, or DeepSeek started to consistently show a clock without failures and with good time, where it previously didn't. ...aaand as I was typing this, it barfed a train wreck. Never mind, move along... No, wait, it's good again, no, wait...

                                                        • frizlab

                                                          yesterday at 8:11 PM

                                                          I knew of Kimi K2 because it’s the model used by Kagi to generate the AI answers when query ends with an interrogation point.

                                                            • OJFord

                                                              yesterday at 11:28 PM

                                                              It's also one of the few 'recommended' models in Kagi Assistant (multi-model ChatGPT basically, available on paid plans).

                                                          • frankfrank13

                                                            yesterday at 9:53 PM

                                                            I find that Kimi K2 looks the best, but i've noticed the time is often wrong!

                                                            • nightpool

                                                              today at 1:05 AM

                                                              It would be cool to also AI generate the favicon using some sort of image model.

                                                              • bArray

                                                                yesterday at 8:07 PM

                                                                It could be that the prompt is accidentally (or purposefully) more optimised for Kimi K2, or that Kimi K2 is better trained on this particular data. LLM's need "prompt engineers" for a reason to get the most out of a particular model.

                                                                  • bigfishrunning

                                                                    yesterday at 9:36 PM

                                                                    How much engineering do prompt engineers do? Is it engineering when you add "photorealistic. correct number of fingers and teeth. High quality." to the end of a prompt?

                                                                    we should call them "prompt witch doctors" or maybe "prompt alchemists".

                                                                      • skeeter2020

                                                                        today at 1:42 PM

                                                                        we used to just call them "good at googling". I've never met a self-described prompt engineer who had anything close to engineering education and experience. Seems like an extension on the 6-week boot camp == software engineer trend.

                                                                        • int_19h

                                                                          today at 12:25 AM

                                                                          I write quite a lot of prompts, and the closest analogy that I can think of is a shaman trying to appease the spirits.

                                                                            • minikomi

                                                                              today at 12:34 AM

                                                                              I find it a surprisingly similar mindset to songwriting, a lot of local maxima searching and spaghetti flinging. Sometime you hit a good groove and explore it.

                                                                                • skeeter2020

                                                                                  today at 1:43 PM

                                                                                  It might be even more ridiculous to make this something akin to art over engineering.

                                                                              • susu1111

                                                                                today at 3:34 AM

                                                                                [dead]

                                                                            • davidsainez

                                                                              today at 2:39 AM

                                                                              Sure, we are still closer to alchemy than materials science, but its still early days. But consider this blogpost that was on the front page today: https://www.levs.fyi/blog/2-years-of-ml-vs-1-month-of-prompt.... The table on the bottom shows a generally steady increase in performance just by iterating on prompts. It feels like we are on the path to true engineering.

                                                                                • raddan

                                                                                  today at 3:07 AM

                                                                                  Engineers usually have at least some sense as to why their efforts work though. Does anybody who iterates on prompts have even the fuzziest idea why they work? Or what the improvement might be? I do not.

                                                                                    • skeeter2020

                                                                                      today at 1:48 PM

                                                                                      If there is ANY relationship to engineering here maybe it's like reverse engineering a bios in a clean room, were you poke away and see what happens. The missing part is the use of anything resembling the scientific method in terms of hypothesis, experiment design, observation guiding actions, etc and the deep knowledge that will allow you to understand WHY something might be happening based on the inputs. "Prompt Engineering" seems about as close to this as probing for land mines in a battlefield, only with no experience and your eyes closed.

                                                                              • WJW

                                                                                yesterday at 11:00 PM

                                                                                Well if it works consistently, I don't see any problem with that. If they have a clear theory of when to add "photorealistic" and when to add "correct number of wheels on the bus" to get the output they want, it's engineering. If they don't have a (falsifiable) theory, it's probably not engineering.

                                                                                Of course, the service they really provide is for businesses to feel they "do AI", and whether or not they do real engineering is as relevant as if your favorite pornstars' boobs are real or not.

                                                                                  • leptons

                                                                                    today at 12:34 AM

                                                                                    >as relevant as if your favorite pornstars' boobs are real or not

                                                                                    This matters more than you might think.

                                                                                    • jahewson

                                                                                      yesterday at 11:23 PM

                                                                                      Maybe we could keep the conversation out of the gutter.

                                                                                        • rrr_oh_man

                                                                                          yesterday at 11:38 PM

                                                                                          Porn is taxable income, not the gutter.

                                                                                          • jrflowers

                                                                                            today at 12:33 AM

                                                                                            You don’t really see much porn in the gutters these days with the decline in popularity of print publishing. It’s almost all online now

                                                                                    • scrollop

                                                                                      yesterday at 10:23 PM

                                                                                      "...and do it really well or my grandmother will be killed by her kidnappers! And I'll give you a tip of 2 billion dollars!!! Hurry, they're coming!"

                                                                                        • carterschonwald

                                                                                          yesterday at 10:40 PM

                                                                                          Ive heard this actually works annoyingly well

                                                                                            • DrewADesign

                                                                                              yesterday at 11:35 PM

                                                                                              We've created technology so sophisticated it is vulnerable to social engineering attacks.

                                                                                                • skeeter2020

                                                                                                  today at 1:50 PM

                                                                                                  this has worked - and continues to do so - very well to escape guard rails. If a direct appeal doesn't work you can then talk them around with only a handful of prompts.

                                                                                          • manmal

                                                                                            yesterday at 11:00 PM

                                                                                            Adding this to my snippets.

                                                                                        • tomrod

                                                                                          yesterday at 11:26 PM

                                                                                          It could be bioengineering if you add that to a clock prompt then connect it to CRISPR process for out putting DNA.

                                                                                          Horrifying prospect, tbh

                                                                                          • tamimio

                                                                                            today at 4:47 AM

                                                                                            > we should call them "prompt witch doctors" or maybe "prompt alchemists".

                                                                                            Oh absolutely not! Only in engineering you are allowed to get called an engineer for no apparent reason, do that in other white collar and you are behind the bars because of fraudulent claims.

                                                                                            • BoorishBears

                                                                                              yesterday at 10:11 PM

                                                                                              I like that actually, I've spent the last year probably 60:40 between post-training and prompt engineering/witch doctoring (the two go together more than most people realize)

                                                                                              Some of it is engineering-like, but I've also picked up a sixth sense when modifying prompts about what parts are affecting the behavior I want to modify for certain models, and that feels very witch doctory!

                                                                                              The more engineering-like part is essentially trying to RE a black box model's post-training, but that goes over some people's heads so I'm happy to help keep the "it's just voodoo and guessing" narrative going instead :)

                                                                                                • lanstin

                                                                                                  today at 12:58 AM

                                                                                                  I think the coherence behind prompt engineering is not in the literal meanings of the words but finding the vocabulary used by the sources that have your solution. Ask questions like a high school math student and you get elementary words back. Ask questions in the lingo of a Linux bigot and you will get good awk scripts back. Use academic maths language and arXiv answers will be produced.

                                                                                              • Dilettante_

                                                                                                yesterday at 9:57 PM

                                                                                                "How is engineering a real science? You just build the bridge so it doesn't fall down."

                                                                                                  • vohk

                                                                                                    yesterday at 10:30 PM

                                                                                                    Nah.

                                                                                                    Actual engineers have professional standards bodies and legal liability when they shirk and the bridge falls down or the plane crashes or your wiring starts on fire.

                                                                                                    Software "engineers" are none of those things but can at least emulate the approaches and strive for reproducibility and testability. Skilled craftsman; not engineers.

                                                                                                    Prompt "engineers" is yet another few steps down the ladder, working out mostly by feel what magic words best tickle each model, and generally with no understanding of what's actually going on under the hood. Closer to a chef coming up with new meals for a restaurant than anything resembling engineering.

                                                                                                    The battle on the use of language around engineer has long been lost but applying it to the subjective creative exercise of writing prompts is just more job title inflation. Something doesn't need to be engineering to be a legitimate job.

                                                                                                      • Dilettante_

                                                                                                        yesterday at 11:41 PM

                                                                                                          The battle on the use of language around engineer has long been lost
                                                                                                        
                                                                                                        That's really the core of the issue: We're just having the age-old battle of prescriptivism vs descriptivism again. An "engineer", etymologically, is basically just "a person who comes up with stuff", one who is "ingenious". I'm tempted to say it's you prescriptivists who are making a "battle" out of this.

                                                                                                          subjective creative exercise of writing prompts
                                                                                                        
                                                                                                        Implying that there are no testable results, no objective success or failure states? Come on man.

                                                                                                        • jahewson

                                                                                                          today at 2:10 AM

                                                                                                          Engineers use their ingenuity. That’s it.

                                                                                                          If physical engineers understood everything then standards would not have changed in many decades. Safety factors would be mostly unnecessary. Clearly not the case.

                                                                                                            • skeeter2020

                                                                                                              today at 1:57 PM

                                                                                                              >> Engineers use their ingenuity. That’s it.

                                                                                                              If this was enough all novel creation would be engineering and that's clearly not true. Engineering attempts to discover & understand consistent outcomes when a myriad of variables are altered, and the boundaries where the variables exceed a model's predictive powers - then add buffer for the unknown. Manipulating prompts (and much of software development) attempts to control the model to limit the number of variables to obtain some form of useful abstraction. Physical engineering can't do this.

                                                                                              • andix

                                                                                                yesterday at 11:30 PM

                                                                                                I think the selection of models is a bit off. Haiku instead of Sonnet for example. Kimi K2's capabilities are closer to Sonnet than to Haiku. GPT-5 might be in the non-reasoning mode, which routes to a smaller model.

                                                                                                  • ceroxylon

                                                                                                    today at 1:12 AM

                                                                                                    I had my suspicions about the GPT-5 routing as well. When I first looked at it, the clock was by far the best; after the minute went by and everything refreshed, the next three were some of the worst of the group. I was wondering if it just hit a lucky path in routing the first time.

                                                                                                • energy123

                                                                                                  yesterday at 8:25 PM

                                                                                                  Goes to show the "frontier" is not really one frontier. It's a social/mathematical construct that's useful for a broad comparison, but if you have a niche task, there's no substitute for trying the different models.

                                                                                                  • woodson

                                                                                                    yesterday at 10:30 PM

                                                                                                    Just use something like DSPy/Ax and optimize your module for any given LLM (based on sample data and metrics) and you’re mostly good. No need to manually wordsmith prompts.

                                                                                                    • observationist

                                                                                                      yesterday at 8:47 PM

                                                                                                      It's not fair to use prompts tailored to a particular model when doing comparisons like this - one shot results that generalize across a domain demonstrate solid knowledge of the domain. You can use prompting and context hacking to get any particular model to behave pseudo-competently in almost any domain, even the tiny <1B models, for some set of questions. You could include an entire framework and model for rendering clocks and times that allowed all 9 models to perform fairly well.

                                                                                                      This experiment, however, clearly states the goal with this prompt: `Create HTML/CSS of an analog clock showing ${time}. Include numbers (or numerals) if you wish, and have a CSS animated second hand. Make it responsive and use a white background. Return ONLY the HTML/CSS code with no markdown formatting.`

                                                                                                      An LLM should be able to interpret that, and should be able to perform a wide range of tasks in that same style - countdown timers, clocks, calendars, floating quote bubble cycling through list of 100 pithy quotations, etc. Individual, clearly defined elements should have complex representations in latent space that correspond to the human understanding of those elements. Tasks and operations and goals should likewise align with our understanding. Qwen 2.5 and some others clearly aren't modeling clocks very well, or maybe the html/css rendering latents are broken. If you pick a semantic axis(like analog clocks), you can run a suite of tests to demonstrate their understanding by using limited one-shot interactions.

                                                                                                      Reasoning models can adapt on the fly, and are capable of cheating - one shots might have crappy representations for some contexts, but after a lot of repetition and refinement, as long as there's a stable, well represented proxy for quality somewhere in the semantics it understands, it can deconstruct a task to fundamentals and eventually reach high quality output.

                                                                                                      These type of tests also allow us to identify mode collapses - you can use complex sophisticated prompting to get most image models to produce accurate analog clocks displaying any time, but in the simple one shot tests, the models tend to only be able to produce the time 10:10, and you'll get wild artifacts and distortions if you try to force any other configuration of hands.

                                                                                                      Image models are so bad at hands that they couldn't even get clock hands right, until recently anyway. Nano banana and some other models are much better at avoiding mode collapses, and can traverse complex and sophisticated compositions smoothly. You want that same sort of semantic generalization in text generating models, so hopefully some of the techniques cross over to other modalities.

                                                                                                      I keep hoping they'll be able to use SAE or some form of analysis on static weight distributions in order to uncover some sort of structural feature of mode collapse, with a taxonomy of different failure modes and causes, like limited data, or corrupt/poisoned data, and so on. Seems like if you had that, you could deliberately iterate on, correct issues, or generate supporting training material to offset big distortions in a model.

                                                                                                        • yesterday at 9:55 PM

                                                                                                          • jquery

                                                                                                            yesterday at 9:04 PM

                                                                                                            Qwen 2.5 is so bad it’s good. Some really insane results if you watch it for a while. Almost like it’s taking the piss.

                                                                                                    • oaktowner

                                                                                                      today at 2:34 AM

                                                                                                      Perhaps Qwen 2.5 should be known as Dali 2.‽

                                                                                                      • jquery

                                                                                                        yesterday at 8:25 PM

                                                                                                        I’ve been using Kimi K2 a lot this month. Gives me Japanese->English translations at near human levels of quality, while respecting rules and context I give it in a very long, multi-page system prompt to improve fidelity of translation for a given translation target (sometimes markup tags need to be preserved, sometimes deleted, etc.). It doesn’t require a thinking step to generate this level of translation quality, making it suitable for real-time translation. It doesn’t start getting confused when I feed it a couple dozen lines of previous translation context, like certain other LLMs do… instead the translation actually improves with more context instead of degrading. It’s never refused a translation for “safety” purposes either (GPT and Gemini love to interrupt my novels and tell me certain behavior is illegal or immoral, and censor various anatomical words).

                                                                                                          • komali2

                                                                                                            today at 2:19 AM

                                                                                                            > GPT and Gemini love to interrupt my novels and tell me certain behavior is illegal or immoral, and censor various anatomical words

                                                                                                            Lol, are you using ai to create fan translations of エロ漫画 ?

                                                                                                              • jquery

                                                                                                                today at 3:12 AM

                                                                                                                それ何のことか全然わからん…冗談だよ。メインはビジュアルノベルとラノベ、たまにエロw

                                                                                                        • abixb

                                                                                                          yesterday at 8:16 PM

                                                                                                          >Qwen 2.5's clocks, on the other hand, look like they never make it out of the womb.

                                                                                                          More like fell headfirst into the ground.

                                                                                                          I'm disappointed with Gemini 2.5 (not sure Pro or Flash) -- I've personally had _fantastic_ results with Gemini 2.5 Pro building PWA, especially since the May 2025 "coding update." [0]

                                                                                                          [0] https://blog.google/products/gemini/gemini-2-5-pro-updates/

                                                                                                          • paulddraper

                                                                                                            yesterday at 9:20 PM

                                                                                                            Kimi K2 is legitimately good.

                                                                                                            • dilap

                                                                                                              yesterday at 10:57 PM

                                                                                                              I'm a huge K2 fan, it has a personality that feels very distinct from other models (not syccophantic at all), and is quite smart. Also pretty good at creative writing (tho not 100% slop free).

                                                                                                              K2 hosted on groq is pretty crazy for intellgence/second. (Low rate limits still, tho.)

                                                                                                              • basch

                                                                                                                yesterday at 9:17 PM

                                                                                                                my GPT-40 was 100% perfect on the first click. Since then, garbage. Gemini 2.5 perfect on the 3rd click.

                                                                                                                • buffaloPizzaBoy

                                                                                                                  yesterday at 8:58 PM

                                                                                                                  Right as you said that, I checked kimi k2’s “clock” and it was just the ascii art: ¯\_(ツ)_/¯

                                                                                                                  I wonder if that is some type of fallback for errors querying the model, or k2 actually created the html/css to display that.

                                                                                                                  • Mistletoe

                                                                                                                    yesterday at 9:53 PM

                                                                                                                    Qwen's clocks are highly entertaining. Like if you asked an alien "make me a clock".

                                                                                                                    • kbar13

                                                                                                                      yesterday at 8:48 PM

                                                                                                                      i noticed the second hand is off tho. gemini has the most accurate one.

                                                                                                                      • stogot

                                                                                                                        yesterday at 9:25 PM

                                                                                                                        When I clicked, everything was garbage except Grok and DeepSeek. kimi was the worst clock

                                                                                                                    • baltimore

                                                                                                                      yesterday at 7:08 PM

                                                                                                                      Since the first (good) image generation models became available, I've been trying to get them to generate an image of a clock with 13 instead of the usual 12 hour divisions. I have not been successful. Usually they will just replace the "12" with a "13" and/or mess up the clock face in some other way.

                                                                                                                      I'd be interested if anyone else is successful. Share how you did it!

                                                                                                                        • Scene_Cast2

                                                                                                                          yesterday at 7:14 PM

                                                                                                                          I've noticed that image models are particularly bad at modifying popular concepts in novel ways (way worse "generalization" than what I observe in language models).

                                                                                                                            • emp17344

                                                                                                                              yesterday at 8:26 PM

                                                                                                                              Maybe LLMs always fail to generalize outside their data set, and it’s just less noticeable with written language.

                                                                                                                                • cluckindan

                                                                                                                                  yesterday at 10:00 PM

                                                                                                                                  This is it. They’re language models which predict next tokens probabilistically and a sampler picks one according to the desired ”temperature”. Any generalization outside their data set is an artifact of random sampling: happenstance and circumstance, not genuine substance.

                                                                                                                                    • cluckindan

                                                                                                                                      today at 1:57 PM

                                                                                                                                      However: do humans have that genuine substance? Is human invention and ingenuity more than trial and error, more than adaptation and application of existing knowledge? Can humans generalize outside their data set?

                                                                                                                                      A yes-answer here implies belief in some sort of gnostic method of knowledge acquisition. Certainly that comes with a high burden of proof!

                                                                                                                                  • phire

                                                                                                                                    today at 1:34 AM

                                                                                                                                    Most image models are diffusion models, not LLMs, and have a bunch of other idiosyncrasies.

                                                                                                                                    So I suspect it's more that lessons from diffusion image models don't carry over to text LLMs.

                                                                                                                                    And the Image models which are based on multi-mode LLMs (like Nano Banana) seem to do a lot better at novel concepts.

                                                                                                                                    • IshKebab

                                                                                                                                      yesterday at 10:58 PM

                                                                                                                                      They definitely don't completely fail to generalise. You can easily prove that by asking them something completely novel.

                                                                                                                                      Do you mean that LLMs might display a similar tendency to modify popular concepts? If so that definitely might be the case and would be fairly easy to test.

                                                                                                                                      Something like "tell me the lord's prayer but it's our mother instead of our father", or maybe "write a haiku but with 5 syllables on every line"?

                                                                                                                                      Let me try those ... nah ChatGPT nailed them both. Feels like it's particular to image generation.

                                                                                                                                  • CobrastanJorji

                                                                                                                                    yesterday at 8:26 PM

                                                                                                                                    Also, they're fundamentally bad at math. They can draw a clock because they've seen clocks, but going further requires some calculations they can't do.

                                                                                                                                    For example, try asking Nano Banana to do something simpler, like "draw a picture of 13 circles." It likely will not work.

                                                                                                                                • andix

                                                                                                                                  yesterday at 11:37 PM

                                                                                                                                  I gave this "riddle" to various models:

                                                                                                                                  > The farmer and the goat are going to the river. They look into the sky and see three clouds shaped like: a wolf, a cabbage and a boat that can carry the farmer and one item. How can they safely cross the river?

                                                                                                                                  Most of them are just giving the result to the well known river crossing riddle. Some "feel" that something is off, but still have a hard time to figure out that wolf, boat and cabbage are just clouds.

                                                                                                                                • deathanatos

                                                                                                                                  yesterday at 7:45 PM

                                                                                                                                    Generate an image of a clock face, but instead of the usual 12 hour numbering, number it with 13 hours. 
                                                                                                                                  
                                                                                                                                  
                                                                                                                                  Gemini, 2.5 Flash or "Nano Banana" or whatever we're calling it these days. https://imgur.com/a/1sSeFX7

                                                                                                                                  A normal (ish) 12h clock. It numbered it twice, in two concentric rings. The outer ring is normal, but the inner ring numbers the 4th hour as "IIII" (fine, and a thing that clocks do) and the 8th hour as "VIIII" (wtf).

                                                                                                                                    • bar000n

                                                                                                                                      yesterday at 7:56 PM

                                                                                                                                      It should be pretty clear already that anything which is based (limited?) to communicating words/text can never grasp conceptual thinking.

                                                                                                                                      We have yet to design a language to cover that, and it might be just a donquijotism we're all diving into.

                                                                                                                                        • bayindirh

                                                                                                                                          yesterday at 8:04 PM

                                                                                                                                          > We have yet to design a language to cover that, and it might be just a donquijotism we're all diving into.

                                                                                                                                          We have a very comprehensive and precise spec for that [0].

                                                                                                                                          If you don't want to hop through the certificate warning, here's the transcript:

                                                                                                                                          - Some day, we won't even need coders any more. We'll be able to just write the specification and the program will write itself.

                                                                                                                                          - Oh wow, you're right! We'll be able to write a comprehensive and precise spec and bam, we won't need programmers any more.

                                                                                                                                          - Exactly

                                                                                                                                          - And do you know the industry term for a project specification that is comprehensive and precise enough to generate a program?

                                                                                                                                          - Uh... no...

                                                                                                                                          - Code, it's called code.

                                                                                                                                          [0]: https://www.commitstrip.com/en/2016/08/25/a-very-comprehensi...

                                                                                                                                            • snickerbockers

                                                                                                                                              yesterday at 8:18 PM

                                                                                                                                              Ive been thinking about that a lot too. Fundamentally it's just a different way of telling the computer what to do and if it seems like telling an llm to make a program is less work than writing it yourself then either your program is extremely trivial or there are dozens of redundant programs in the training set that are nearly identical.

                                                                                                                                              If you're actualy doing real work you have nothing to fear from LLMs because any prompt which is specific enough to create a given computer program is going to be comparable in terms of complexity and effort to having done it yourself.

                                                                                                                                          • Uehreka

                                                                                                                                            yesterday at 8:24 PM

                                                                                                                                            I don’t think that’s clear at all. In fact the proficiency of LLMs at a wide variety of tasks would seem to indicate that language is a highly efficient encoding of human thought, much moreso than people used to think.

                                                                                                                                              • tsunamifury

                                                                                                                                                today at 1:39 AM

                                                                                                                                                Yea it’s amazing that the parent post literally misunderstands the fundamental realities of LLMs and the compression they reveal in linguistics even if blurry is incredible.

                                                                                                                                            • XenophileJKO

                                                                                                                                              yesterday at 9:40 PM

                                                                                                                                              I mean, that's not really "true".

                                                                                                                                              https://claude.ai/public/artifacts/0f1b67b7-020c-46e9-9536-c...

                                                                                                                                              • rideontime

                                                                                                                                                yesterday at 7:59 PM

                                                                                                                                                Really? I can grasp the concept behind that command just fine.

                                                                                                                                        • edub

                                                                                                                                          today at 7:37 AM

                                                                                                                                          I was able to have AI generate an image that made this, but not by diffusion/autoregressive but by having it write Python code to create the image.

                                                                                                                                          ChatGPT made a nice looking clock with matplotlib that had some bugs that it had to fix (hours were counter-clockwise). Gemini made correct code one-shot, it used Pillow instead of matplotlib, but it didn't look as nice.

                                                                                                                                          • nl

                                                                                                                                            today at 8:33 AM

                                                                                                                                            I do playing card generation and almost all struggle beyond the "6 of X"

                                                                                                                                            My working theory is that they were trained really hard to generate 5 fingers on hands but their counting drops off quickly.

                                                                                                                                            • BrandoElFollito

                                                                                                                                              yesterday at 7:36 PM

                                                                                                                                              This is really cool. I tried to prompt gemini but every time I got the same picture. I do not know how to share a session (like it is possible with Chatgpt) but the prompts were

                                                                                                                                              If a clock had 13 hours, what would be the angle between two of these 13 hours?

                                                                                                                                              Generate an image of such a clock

                                                                                                                                              No, I want the clock to have 13 distinct hours, with the angle between them as you calculated above

                                                                                                                                              This is the same image. There need to be 13 hour marks around the dial, evenly spaced

                                                                                                                                              ... And its last answer was

                                                                                                                                              You are absolutely right, my apologies. It seems I made an error and generated the same image again. I will correct that immediately.

                                                                                                                                              Here is an image of a clock face with 13 distinct hour marks, evenly spaced around the dial, reflecting the angle we calculated.

                                                                                                                                              And the very same clock, with 12 hours, and a 13th above the 12...

                                                                                                                                                • ryandrake

                                                                                                                                                  yesterday at 7:46 PM

                                                                                                                                                  This is probably my biggest problem with AI tools, having played around with them more lately.

                                                                                                                                                  "You're absolutely right! I made a mistake. I have now comprehensively solved this problem. Here is the corrected output: [totally incorrect output]."

                                                                                                                                                  None of them ever seem to have the ability to say "I cannot seem to do this" or "I am uncertain if this is correct, confidence level 25%" The only time they will give up or refuse to do something is when they are deliberately programmed to censor for often dubious "AI safety" reasons. All other times, they come back again and again with extreme confidence as they totally produce garbage output.

                                                                                                                                                    • BrandoElFollito

                                                                                                                                                      yesterday at 7:56 PM

                                                                                                                                                      I agree, I see the same even in simple code where they will bend backwards apologizing and generate very similar crap.

                                                                                                                                                      It is like they are sometimes stuck in a local energetic minimum and will just wobble around various similar (and incorrect) answers.

                                                                                                                                                      What was annoying in my attempt above is that the picture was identical for every attempt

                                                                                                                                                        • ryandrake

                                                                                                                                                          yesterday at 8:09 PM

                                                                                                                                                          These tools 'attitude' reminds me of an eager, but incompetent intern or a poorly trained administrative assistant, who works for a powerful CEO. All sycophancy, confidence and positive energy, but not really getting much done.

                                                                                                                                                          • SamBam

                                                                                                                                                            yesterday at 9:18 PM

                                                                                                                                                            The issue is the they always say "Here's the final, correct answer" before they've written the answer, so of course the LLM has no idea if it's going to be right before it starts, because it has no clue what it's going to say.

                                                                                                                                                            I wonder how it would do if instead it were told "Do not tell me at the start that the solution is going to be correct. Instead, tell me the solution, and at the end tell me if you think it's correct or not."

                                                                                                                                                            I have found that on certain logic puzzles that it simply cannot get right, it always tells me that it's going to get it quite "this last time," but if asked later it always recognizes its errors.

                                                                                                                                                        • int_19h

                                                                                                                                                          today at 12:28 AM

                                                                                                                                                          Gemini specifically is actually kinda notorious for giving up.

                                                                                                                                                          https://www.reddit.com/r/artificial/comments/1mp5mks/this_is...

                                                                                                                                                      • notatoad

                                                                                                                                                        yesterday at 11:23 PM

                                                                                                                                                        you can click the share icon (the two-way branch icon, it doesn't look like apple's share icon) under the image it generates to share the conversation.

                                                                                                                                                        i'm curious if the clock image it was giving you was the same one it was giving me

                                                                                                                                                        https://gemini.google.com/share/780db71cfb73

                                                                                                                                                          • BrandoElFollito

                                                                                                                                                            today at 12:57 PM

                                                                                                                                                            Thanks for the tip about sharing!

                                                                                                                                                            No, my clock was an old style one, to be put on a shelf. But at least it had a "13" proudly right above the "12" :)

                                                                                                                                                            This reminds me my kids when they were in kindergarden and were bringing home their art that needed extra explanation to realize what it was. But they were very proud!

                                                                                                                                                    • giancarlostoro

                                                                                                                                                      yesterday at 8:00 PM

                                                                                                                                                      Weird, I never tried that, I tried all the usual tricks that usually work including swearing at the model (this scarily works surprisingly well with LLMs) and nothing. I even tried to go the opposite direction, I want a 6 hour clock.

                                                                                                                                                      • chanux

                                                                                                                                                        today at 1:29 AM

                                                                                                                                                        Ah! This is so sad. The manager types won't be able to add an hour (actually, two) to the day even with AI.

                                                                                                                                                        • echelon

                                                                                                                                                          yesterday at 7:23 PM

                                                                                                                                                          That's just a patch to the training data.

                                                                                                                                                          Once companies see this starting to show up in the evals and criticisms, they'll go out of their way to fix it.

                                                                                                                                                            • rideontime

                                                                                                                                                              yesterday at 8:00 PM

                                                                                                                                                              What would the "patch" be? Manually create some images of 13-hour clocks and add them to the training data? How does that solution scale?

                                                                                                                                                              • godelski

                                                                                                                                                                yesterday at 8:09 PM

                                                                                                                                                                s/13/17/g ;)

                                                                                                                                                            • snek_case

                                                                                                                                                              yesterday at 7:12 PM

                                                                                                                                                              From my experience they quickly fail to understand anything beyond a superficial description of the image you want.

                                                                                                                                                            • usui

                                                                                                                                                              yesterday at 8:36 PM

                                                                                                                                                              I've been trying for the longest time and across models to generate pictures or cartoons of people with six fingers and now they won't do it. They always say they accomplished it, but the result always has 5 fingers. I hate being gaslit.

                                                                                                                                                              • coffeecoders

                                                                                                                                                                yesterday at 7:28 PM

                                                                                                                                                                LLMs are terrible for out-of-distribution (OOD) tasks. You should use chain of thought suppression and give constaints explictly.

                                                                                                                                                                My prompt to Grok:

                                                                                                                                                                ---

                                                                                                                                                                Follow these rules exactly:

                                                                                                                                                                - There are 13 hours, labeled 1–13.

                                                                                                                                                                - There are 13 ticks.

                                                                                                                                                                - The center of each number is at angle: index * (360/13)

                                                                                                                                                                - Do not infer anything else.

                                                                                                                                                                - Do not apply knowledge of normal clocks.

                                                                                                                                                                Use the following variables:

                                                                                                                                                                HOUR_COUNT = 13

                                                                                                                                                                ANGLE_PER_HOUR = 360 / 13 // 27.692307°

                                                                                                                                                                Use index i ∈ [0..12] for hour marks:

                                                                                                                                                                angle_i = i * ANGLE_PER_HOUR

                                                                                                                                                                I want html/css (single file) of a 13-hour analog clock.

                                                                                                                                                                ---

                                                                                                                                                                Output from grok.

                                                                                                                                                                https://jsfiddle.net/y9zukcnx/1/

                                                                                                                                                                  • chemotaxis

                                                                                                                                                                    yesterday at 7:45 PM

                                                                                                                                                                    > Follow these rules exactly:

                                                                                                                                                                    "Here's the line-by-line specification of the program I need you to write. Write that program."

                                                                                                                                                                      • serf

                                                                                                                                                                        today at 10:14 AM

                                                                                                                                                                        it's lazy to dust off the major advantages of a pseudocode-to-anylanguage transpiler as if it's somehow easy or commonplace.

                                                                                                                                                                        • signatoremo

                                                                                                                                                                          yesterday at 8:26 PM

                                                                                                                                                                          Can you write this program in any language?

                                                                                                                                                                            • bigfishrunning

                                                                                                                                                                              yesterday at 9:38 PM

                                                                                                                                                                              Yes.

                                                                                                                                                                              • chemotaxis

                                                                                                                                                                                yesterday at 8:44 PM

                                                                                                                                                                                No, do I need to?

                                                                                                                                                                        • BrandoElFollito

                                                                                                                                                                          yesterday at 7:39 PM

                                                                                                                                                                          Well, that's cheating :) You asked it to generate code, which is ok because it does not represent a direct generated image of a clock.

                                                                                                                                                                          Can grok generate images? What would the result be?

                                                                                                                                                                          I will try your prompt on chatgpt and gemini

                                                                                                                                                                            • BrandoElFollito

                                                                                                                                                                              yesterday at 7:45 PM

                                                                                                                                                                              Gemini failed miserably - a standard 12 hours clock

                                                                                                                                                                              Same for chatgpt

                                                                                                                                                                              And perplexity replaced 12 with 13

                                                                                                                                                                                • dwringer

                                                                                                                                                                                  yesterday at 8:05 PM

                                                                                                                                                                                  > Please create a highly unusual 13-hour analog clock widget, synchronized to system time, with fully animated hands that move in real time, and not 12 but 13 hour markings - each will be spaced at not 5-minute intervals, but at 4-minute-37-second intervals. This makes room for all 13 hour markings. Please pay attention to the correct alignment of the 13 numbers and the 13 hour marks, as well as the alignment of the hands on the face.

                                                                                                                                                                                  This gave me a correct clock face on Gemini- after the model spent a lot of time thinking (and kind of thrashing in a loop for a while). The functionality isn't quite right, not that it entirely makes sense in the first place, but the face - at least in terms of the hour marks - looks OK to me.[0]

                                                                                                                                                                                  [0] https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...

                                                                                                                                                                          • chiwilliams

                                                                                                                                                                            yesterday at 7:52 PM

                                                                                                                                                                            I'll also note that the output isn't quite right --- the top number should be 13 rather than 1!

                                                                                                                                                                              • layer8

                                                                                                                                                                                yesterday at 8:15 PM

                                                                                                                                                                                I mean, the specification for the hour marks (angle_i) starts with a mark at angle 0. It just followed that spec. ;)

                                                                                                                                                                            • NooneAtAll3

                                                                                                                                                                              yesterday at 8:44 PM

                                                                                                                                                                              close enough, but digit at the top should be the highest, not 1 :/

                                                                                                                                                                          • IAmGraydon

                                                                                                                                                                            yesterday at 7:20 PM

                                                                                                                                                                            That's because they literally cannot do that. Doing what you're asking requires an understanding of why the numbers on the clock face are where they are and what it would mean if there was an extra hour on the clock (ie that you would have to divide 360 by 13 to begin to understand where the numbers would go). AI models have no concept of anything that's not included in their training data. Yet people continue to anthropomorphize this technology and are surprised when it becomes obvious that it's not actually thinking.

                                                                                                                                                                              • energy123

                                                                                                                                                                                yesterday at 7:46 PM

                                                                                                                                                                                The hope was for this understanding to emerge as the most efficient solution to the next-token prediction problem.

                                                                                                                                                                                Put another way, it was hoped that once the dataset got rich enough, developing this understanding is actually more efficient for the neural network than memorizing the training data.

                                                                                                                                                                                The useful question to ask, if you believe the hope is not bearing fruit, is why. Point specifically to the absent data or the flawed assumption being made.

                                                                                                                                                                                Or more realistically, put in the creative and difficult research work required to discover the answer to that question.

                                                                                                                                                                                • bobbylarrybobby

                                                                                                                                                                                  yesterday at 7:24 PM

                                                                                                                                                                                  It's interesting because if you asked them to write code to generate an SVG of a clock, they'd probably use a loop from 1 to 12, using sin and cos of the angle (given by the loop index over 12 times 2pi) to place the numerals. They know how to do this, and so they basically understand the process that generates a clock face. And extrapolating from that to 13 hours is trivial (for a human). So the fact that they can't do this extrapolation on their own is very odd.

                                                                                                                                                                                  • ryandrake

                                                                                                                                                                                    yesterday at 7:49 PM

                                                                                                                                                                                    I wonder if you would have more success if you painstakingly described the shape and features of a clock in great detail but never used the words clock or time or anything that might give the AI the hint that they were supposed to output something like a clock.

                                                                                                                                                                                      • BrandoElFollito

                                                                                                                                                                                        yesterday at 8:00 PM

                                                                                                                                                                                        And this is a problem for me. I guess that it would work, but as soon as the word "clock" appears, gone is the request because a clock HAS.12.HOURS.

                                                                                                                                                                                        I use this a lot in cybersecurity when I need to do something "illegal". I am refused help, until I say that I am doing research on cybersecurity. In that case no problem.

                                                                                                                                                                                    • Workaccount2

                                                                                                                                                                                      yesterday at 8:08 PM

                                                                                                                                                                                      The problem is more likely the tokenization of images than anything. These models do their absolute worst when pictures are involved, but are seemingly miraculous at generalizing with just text.

                                                                                                                                                                                        • chemotaxis

                                                                                                                                                                                          yesterday at 8:50 PM

                                                                                                                                                                                          I wonder if it's because we mean different things by generalization.

                                                                                                                                                                                          For text, "generalization" is still "generate text that conforms to all the usual rules of the language". For images of 13-hour clock faces, we're explicitly asking the LLM to violate the inferred rules of the universe.

                                                                                                                                                                                          I think a good analogy would be asking an LLM to write in English, except the word "the" now means "purple". They will struggle to adhere to this prompt in a conversation.

                                                                                                                                                                                            • Workaccount2

                                                                                                                                                                                              today at 12:50 AM

                                                                                                                                                                                              That's true, but I think humans would stumble a lot too (try reading old printed text from the 18fh cenfury where fhey used "f" insfead of t in prinf, if's a real frick fo gef frough).

                                                                                                                                                                                              However humans are pretty adept at discerning images, even ones outside the norm. I really think there is some kind of architectural block hampering transformers ability to really "see" images. For instance if you show any model a picture of a dog with 5 legs (a fifth leg photoshopped to it's belly) they all say there are only 4 legs. And will argue with you about it. Hell GPT-5 even wrote a leg detection script in python (impressive) which detected the 5 legs, and then it said the script was bugged, and modified the parameters until one of the legs wasn't detected, lol.

                                                                                                                                                                                                • onraglanroad

                                                                                                                                                                                                  today at 6:41 AM

                                                                                                                                                                                                  An "f" never replaced a "t".

                                                                                                                                                                                                  You probably mean the "long s" that looks like an "f".

                                                                                                                                                                                      • godelski

                                                                                                                                                                                        yesterday at 8:10 PM

                                                                                                                                                                                        Yes, the problem is that these so called "world models" do not actually contain a model of the world, or any world

                                                                                                                                                                                        • echelon

                                                                                                                                                                                          yesterday at 7:25 PM

                                                                                                                                                                                          gpt-image-1 and Google Imagen understand prompts, they just don't have training data to cover these use cases.

                                                                                                                                                                                          gpt-image-1 and Imagen are wickedly smart.

                                                                                                                                                                                          The new Nano Banana 2 that has been briefly teased around the internet can solve incredibly complicated differential equations on chalk boards with full proof of work.

                                                                                                                                                                                            • phkahler

                                                                                                                                                                                              yesterday at 7:57 PM

                                                                                                                                                                                              >> The new Nano Banana 2 that has been briefly teased around the internet can solve incredibly complicated differential equations on chalk boards with full proof of work.

                                                                                                                                                                                              That's great, but I bet it can't tie it's own shoes.

                                                                                                                                                                                                • esafak

                                                                                                                                                                                                  yesterday at 9:29 PM

                                                                                                                                                                                                  And a submarine can't swim. Big deal.

                                                                                                                                                                                                  • echelon

                                                                                                                                                                                                    yesterday at 9:37 PM

                                                                                                                                                                                                    No, but I can get it to do a lot of work.

                                                                                                                                                                                                    It's a part of my daily tool box.

                                                                                                                                                                                    • ryandrake

                                                                                                                                                                                      yesterday at 8:03 PM

                                                                                                                                                                                      I've been struggling all week trying to get Claude Code to write code to produce visual (not the usual, verifiable, text on a terminal) output in the form of a SDL_GPU rendered scene consisting of the usual things like shaders, pipelines, buffers, textures and samplers, vertex and index data and so on, and boy it just doesn't seem to know what it's doing. Despite providing paragraphs-long, detailed prompts. Despite describing each uniform and each matrix that needs to be sent. Despite giving it extremely detailed guidance about what order things need to be done in. It would have been faster for me to just write the code myself.

                                                                                                                                                                                      When it fails a couple of times it will try to put logging in place and then confidently tell me things like "The vertex data has been sent to the renderer, therefore the output is correct!" When I suggest it take a screenshot of the output each time to verify correctness, it does, and then declares victory over an entirely incorrect screenshot. When I suggest it write unit tests, it does so, but the tests are worthless and only tests that the incorrect code it wrote is always incorrect in the same ways.

                                                                                                                                                                                      When it fails even more times, it will get into this what I like to call "intern engineer" mode where it just tries random things that I know are not going to work. And if I let it keep going, it will end up modifying the entire source tree with random "try this" crap. And each iteration, it confidently tells me: "Perfect! I have found the root cause! It is [garbage bullshit]. I have corrected it and the code is now completely working!"

                                                                                                                                                                                      These tools are cute, but they really need to go a long way before they are actually useful for anything more than trivial toy projects.

                                                                                                                                                                                        • rossant

                                                                                                                                                                                          yesterday at 8:27 PM

                                                                                                                                                                                          Have you tried OpenAI Codex with GPT5.1? I'm using it for similar GPU rendering stuff and it appears to do an excellent job.

                                                                                                                                                                                          • fancy_pantser

                                                                                                                                                                                            yesterday at 8:09 PM

                                                                                                                                                                                            Have you given using MCPs to provide documentation and examples a shot? I always have to bring in docs since I don't work in Python and TS+React (which it seems more capable at) and force it to review those in addition to any specification. e.g. Context7

                                                                                                                                                                                              • ryandrake

                                                                                                                                                                                                yesterday at 9:40 PM

                                                                                                                                                                                                Haven't looked into MCPs yet. Thanks for the suggestion!

                                                                                                                                                                                            • poszlem

                                                                                                                                                                                              yesterday at 8:55 PM

                                                                                                                                                                                              I’m not sure if it's just me, but I've also noticed Claude becoming even more lazy. For example, I've asked it several times to fix my tests. It'll fix four or five of them, then start struggling with the next couple, and suddenly declare something like: "All done, fixed 5 out of 10 tests. I can’t fix the remaining ones", followed by a long, convoluted explanation about why that’s actually a good thing.

                                                                                                                                                                                                • __MatrixMan__

                                                                                                                                                                                                  today at 11:12 AM

                                                                                                                                                                                                  I don't know if it has gotten worse, but I definitely find Claude is way too eager to celebrate success when it has done nothing.

                                                                                                                                                                                                  It's annoying but I prefer it to how Gemini gets depressed if it takes a few tries to make progress. Like, thanks for not gaslighing me, but now I'm feeling sorry for a big pile of numbers, which was not a stated goal in my prompt.

                                                                                                                                                                                              • jamilton

                                                                                                                                                                                                yesterday at 8:48 PM

                                                                                                                                                                                                I know this has been said many times before, but I wonder why this is such a common outcome. Maybe from negative outcomes being underrepresented in the training data? Maybe that plus being something slightly niche and complex?

                                                                                                                                                                                                The screenshot method not working is unsurprising to me, VLLMs visual reasoning is very bad with details because they (as far as I understand) do not really have access to those details, just the image embedding and maybe an OCR'd transcript.

                                                                                                                                                                                            • munro

                                                                                                                                                                                              yesterday at 7:50 PM

                                                                                                                                                                                              Amazing, some people are so enamored with LLMs who use them for soft outcomes, and disagree with me when I say be careful they're not perfect -- this is such a great non technical way to explain the reality I'm seeing when using on hard outcome coding/logic tasks. "Hey this test is failing", LLM deletes test, "FIXED!"

                                                                                                                                                                                                • derbOac

                                                                                                                                                                                                  yesterday at 11:39 PM

                                                                                                                                                                                                  Something that struck me when I was looking at the clocks is that we know what a clock is supposed to look and act like.

                                                                                                                                                                                                  What about when we don't know what it's supposed to look like?

                                                                                                                                                                                                  Lately I've been wrestling with the fact that unlike, say, a generalized linear model fit to data with some inferential theory, we don't have a theory or model for the uncertainty about LLM products. We recognize when it's off about things we know are off, but don't have a way to estimate when it's off other than to check it against reality, which is probably the exception to how it's used rather than the rule.

                                                                                                                                                                                                    • ehnto

                                                                                                                                                                                                      today at 5:32 AM

                                                                                                                                                                                                      I need to be delicate with wording here, but this is why it's a worry that all the least intelligent people you know could be using AI.

                                                                                                                                                                                                      It's why non-coders think it's doing an amazing job at software.

                                                                                                                                                                                                      But it's worryingly why using it for research, where you necessarily don't know what you don't know, is going to trip up even smarter people.

                                                                                                                                                                                                  • worldsayshi

                                                                                                                                                                                                    yesterday at 9:07 PM

                                                                                                                                                                                                    Yeah it seems crazy to use LLM on any task where the output can't be easily verified.

                                                                                                                                                                                                      • palmotea

                                                                                                                                                                                                        yesterday at 11:12 PM

                                                                                                                                                                                                        > Yeah it seems crazy to use LLM on any task where the output can't be easily verified.

                                                                                                                                                                                                        I disagree, those tasks are perfect for LLMs, since a bug you can't verify isn't a problem when vibecoding.

                                                                                                                                                                                                    • markatkinson

                                                                                                                                                                                                      today at 9:08 AM

                                                                                                                                                                                                      To be fair I'd probably also delete the test.

                                                                                                                                                                                                      • mopsi

                                                                                                                                                                                                        yesterday at 10:28 PM

                                                                                                                                                                                                          > "Hey this test is failing", LLM deletes test, "FIXED!"
                                                                                                                                                                                                        
                                                                                                                                                                                                        A nice continuation of the tradition of folk stories about supernatural entities like teapots or lamps that grant wishes and take them literally. "And that's why, kids, you should always review your AI-assisted commits."

                                                                                                                                                                                                    • kylecazar

                                                                                                                                                                                                      yesterday at 9:58 PM

                                                                                                                                                                                                      Non-determinism at it's finest. The clock is perfect, the refresh happens, the clock looks like a Dali painting.

                                                                                                                                                                                                        • jeremycarter

                                                                                                                                                                                                          today at 5:18 AM

                                                                                                                                                                                                          Last year I wrote a simple system using Semantic Kernel, backed by functions inside Microsoft Orleans, which for the most part was a business logic DSL processor by LLM. Your business logic was just text, and you gave it the operation as text.

                                                                                                                                                                                                          Nothing could be relied upon to be deterministic, it was so funny to see it try to do operations.

                                                                                                                                                                                                          Recently I re-ran it with newer models and was drastically better, especially with temperature tweaks.

                                                                                                                                                                                                      • anon_cow1111

                                                                                                                                                                                                        yesterday at 11:17 PM

                                                                                                                                                                                                        I'm having a hard time believing this site is honest, especially with how ridiculous the scaling and rotation of numbers is for most of them. I dumped his prompt into chatgpt to try it myself and it did create a very neat clock face with the numbers at the correct position+animated second hand, it just got the exact time wrong, being a few hours off.

                                                                                                                                                                                                        Edit: the time may actually have been perfect now that I account for my isp's geo-located time zone

                                                                                                                                                                                                          • Zopieux

                                                                                                                                                                                                            today at 12:49 AM

                                                                                                                                                                                                            On the contrary, in my experience this is very typical of the average failure mode / output of early 2025 LLMs for HTML of SVG.

                                                                                                                                                                                                            • perfmode

                                                                                                                                                                                                              yesterday at 11:30 PM

                                                                                                                                                                                                              i read that the OP limited the output to 2000 tokens.

                                                                                                                                                                                                                • lanewinfield

                                                                                                                                                                                                                  yesterday at 11:35 PM

                                                                                                                                                                                                                  ^ this! there's a lot of clocks to generate so I've challenged it to stick to a small(er) amount of code

                                                                                                                                                                                                                  • anon_cow1111

                                                                                                                                                                                                                    yesterday at 11:42 PM

                                                                                                                                                                                                                    I got a ~1600 character reply from gpt, including spaces and it worked first shot dumping into an html doc. I think that probably fits ok in the limit? (If I missed something obvious feel free to tell me I'm an idiot)

                                                                                                                                                                                                                      • Springtime

                                                                                                                                                                                                                        today at 1:32 AM

                                                                                                                                                                                                                        On the second minute I had the AI World Clocks site open the GPT-5 generated version displayed a perfect clock. Its clock before and every clock from it since has had very apparent issues though.

                                                                                                                                                                                                                        If you could get a perfect clock several times for the identical prompt in fresh contexts with the same model then it'd be a better comparison. Potentially the ChatGPT site you're using though is doing some adjustments that the API fed version isn't.

                                                                                                                                                                                                            • porphyra

                                                                                                                                                                                                              yesterday at 8:58 PM

                                                                                                                                                                                                              LLMs can't "look" at the rendered HTML output to see if what they generated makes sense or not. But there ought to be a way to do that right? To let the model iterate until what it generates looks right.

                                                                                                                                                                                                              Currently, at work, I'm using Cursor for something that has an OpenGL visualization program. It's incredibly frustrating trying to describe bugs to the AI because it is completely blind. Like I just wanna tell it "there's no line connecting these two points but there ought to be one!" or "your polygon is obviously malformed as it is missing a bunch of points and intersects itself" but it's impossible. I end up having to make the AI add debug prints to, say, print out the position of each vertex, in order to convince it that it has a bug. Very high friction and annoying!!!

                                                                                                                                                                                                                • firtoz

                                                                                                                                                                                                                  yesterday at 9:08 PM

                                                                                                                                                                                                                  Cursor has this with their "browser" function for web dev, quite useful

                                                                                                                                                                                                                  You can also give it a mcp setup that it can send a screenshot to the conversation, though unsure if anyone made an easy enough "take screenshot of a specific window id" kind of mcp, so may need to be built first

                                                                                                                                                                                                                  I guess you could also ask it to build that mcp for you...

                                                                                                                                                                                                                  • pil0u

                                                                                                                                                                                                                    yesterday at 9:19 PM

                                                                                                                                                                                                                    I had some success providing screenshots to Cursor directly. It worked well for web UIs as well as generated graphs in Python. It makes them a bit less blind, though I feel more iterations are required.

                                                                                                                                                                                                                    • TheKidCoder

                                                                                                                                                                                                                      yesterday at 9:03 PM

                                                                                                                                                                                                                      Kinda - Hand waiving over the question of if an LLM can really "look" but you can connect Cursor to a Puppeteer MCP server which will allow it to iterate with "eyes" by using Puppeteer to screenshot it's own output. Still has issues, but it does solve really silly mistakes often simply by having this MCP available.

                                                                                                                                                                                                                      • EMM_386

                                                                                                                                                                                                                        yesterday at 9:42 PM

                                                                                                                                                                                                                        You can absolutely do this. In fact, with Claude Anthropic encourages you to send it screenshots. It works very well if you aren't expecting pixel-perfection.

                                                                                                                                                                                                                        YMMV with other models but Sonnet 4.5 is good with things like this - writing the code, "seeing" the output and then iterating on it.

                                                                                                                                                                                                                        • fragmede

                                                                                                                                                                                                                          yesterday at 9:18 PM

                                                                                                                                                                                                                          Claude totally can, same with ChatGPT. Upload a picture to either one of them via the app and tell it there's no line where there should be. There’s some plumbing involved to get it to work in Claude code or codex, but yes, computers can "see". If you have lm-server, there's tons of non-text models you can point your code at.

                                                                                                                                                                                                                      • zkmon

                                                                                                                                                                                                                        yesterday at 7:07 PM

                                                                                                                                                                                                                        Why are Deepseek and Kimi are beating other models by so much margin? Is this to do with their specialization for this task?

                                                                                                                                                                                                                          • yesterday at 7:50 PM

                                                                                                                                                                                                                        • anotheryou

                                                                                                                                                                                                                          today at 10:22 AM

                                                                                                                                                                                                                          Claude Sonnet 4.5 with a little thinking: https://imgur.com/a/zcJOnKy

                                                                                                                                                                                                                          no thinking: better clock but not current time (the prompt is confusing here though): https://imgur.com/a/kRK3Q18

                                                                                                                                                                                                                        • mandolingual

                                                                                                                                                                                                                          yesterday at 9:10 PM

                                                                                                                                                                                                                          Always interesting/uncanny when AI is tested with human cognitive tests https://www.psychdb.com/cognitive-testing/clock-drawing-test.

                                                                                                                                                                                                                          • em3rgent0rdr

                                                                                                                                                                                                                            yesterday at 6:59 PM

                                                                                                                                                                                                                            Most look like they were done by a beginner programmer on crack, but every once in a while a correct one appears.

                                                                                                                                                                                                                              • shafoshaf

                                                                                                                                                                                                                                yesterday at 7:11 PM

                                                                                                                                                                                                                                It's interesting how drawing a clock is one of the primary signals for dementia. https://www.verywellhealth.com/the-clock-drawing-test-98619

                                                                                                                                                                                                                                  • BrandoElFollito

                                                                                                                                                                                                                                    yesterday at 8:14 PM

                                                                                                                                                                                                                                    This is very interesting, thank you.

                                                                                                                                                                                                                                    I could not get to the store because of the cookie banner that does not work (at left on mobile chrome and ff). The Internet Archive page: https://archive.ph/qz4ep

                                                                                                                                                                                                                                    I wonder how this test could be modified for people that have neurological problems - my father's hands shake a lot but I would like to try the test on him (I do not have suspicions, just curious).

                                                                                                                                                                                                                                    I passed it :)

                                                                                                                                                                                                                                    • technothrasher

                                                                                                                                                                                                                                      yesterday at 9:01 PM

                                                                                                                                                                                                                                      "One variation of the test is to provide the person with a blank piece of paper and ask them to draw a clock showing 10 minutes after 11. The word "hands" is not used to avoid giving clues."

                                                                                                                                                                                                                                      Hmm, ambiguity. I would be the smart ass that drew a digital clock for them, or a shaku-dokei.

                                                                                                                                                                                                                                  • pixl97

                                                                                                                                                                                                                                    yesterday at 7:02 PM

                                                                                                                                                                                                                                    DeepSeek and Kimi seem to have correct ones most of the time I've looked.

                                                                                                                                                                                                                                      • BrandoElFollito

                                                                                                                                                                                                                                        yesterday at 8:15 PM

                                                                                                                                                                                                                                        DeepSeek told me that it cannot generate pictures and suggested code (which is very different)

                                                                                                                                                                                                                                        • em3rgent0rdr

                                                                                                                                                                                                                                          yesterday at 7:04 PM

                                                                                                                                                                                                                                          yes, and sometimes Grok.

                                                                                                                                                                                                                                            • pixl97

                                                                                                                                                                                                                                              yesterday at 7:38 PM

                                                                                                                                                                                                                                              The hour hand commonly seems off on Grok.

                                                                                                                                                                                                                                      • energy123

                                                                                                                                                                                                                                        yesterday at 7:51 PM

                                                                                                                                                                                                                                        If they can identify which one is correct, then it's the same as always being correct, just with an expensive compute budget.

                                                                                                                                                                                                                                        • morkalork

                                                                                                                                                                                                                                          yesterday at 7:01 PM

                                                                                                                                                                                                                                          I'd say more like a blind programmer in the early stages of dementia. Able to write code, unable to form a mental image of what it would render as and can't see the final result.

                                                                                                                                                                                                                                      • arendtio

                                                                                                                                                                                                                                        today at 10:31 AM

                                                                                                                                                                                                                                        Pretty cool already!

                                                                                                                                                                                                                                        I use 'Sonnet 4.5 thinking' and 'Composer 1' (Cursor) the most, so it would be interesting to see how such SOTA models perform in this task.

                                                                                                                                                                                                                                        • ugh123

                                                                                                                                                                                                                                          yesterday at 6:59 PM

                                                                                                                                                                                                                                          Cool, and marginally informative on the current state of things. but kind of a waste of energy given everything is re-done every minute to compare. We'd probably only need a handful of each to see the meaningful differences.

                                                                                                                                                                                                                                            • whoisjuan

                                                                                                                                                                                                                                              yesterday at 7:05 PM

                                                                                                                                                                                                                                              It's actually quite fascinating if you watch it for 5 minutes. Some models are overall bad, but others nail it in one minute and butcher it in the next.

                                                                                                                                                                                                                                              It's perhaps the best example I have seen of model drift driven by just small, seemingly unimportant changes to the prompt.

                                                                                                                                                                                                                                                • alister

                                                                                                                                                                                                                                                  yesterday at 7:19 PM

                                                                                                                                                                                                                                                  > model drift driven by just small, seemingly unimportant changes to the prompt

                                                                                                                                                                                                                                                  What changes to the prompt are you referring to?

                                                                                                                                                                                                                                                  According the comment on the site, the prompt is the following:

                                                                                                                                                                                                                                                  Create HTML/CSS of an analog clock showing ${time}. Include numbers (or numerals) if you wish, and have a CSS animated second hand. Make it responsive and use a white background. Return ONLY the HTML/CSS code with no markdown formatting.

                                                                                                                                                                                                                                                  The prompt doesn't seem to change.

                                                                                                                                                                                                                                                    • whoisjuan

                                                                                                                                                                                                                                                      yesterday at 7:45 PM

                                                                                                                                                                                                                                                      The time given to the model. So the difference between two generations is just somethng trivially different like: "12:35" vs 12:36"

                                                                                                                                                                                                                                                      • sambaumann

                                                                                                                                                                                                                                                        yesterday at 7:29 PM

                                                                                                                                                                                                                                                        presumably the time is replaced with the actual current time at each generation. I wonder if they are actually generated every minute or if all 6480 permutations (720 minutes in a day * 9 llms) were generated and just show on a schedule

                                                                                                                                                                                                                                                    • nbaugh1

                                                                                                                                                                                                                                                      yesterday at 7:37 PM

                                                                                                                                                                                                                                                      It is really interesting to watch them for a while. QWEN keeps outputting some really abstract interpretations of a clock, KIMI is consistently very good, GPT5's results line up exactly with my experience with its code output (overly complex and never working correctly)

                                                                                                                                                                                                                                                      • bglusman

                                                                                                                                                                                                                                                        yesterday at 8:50 PM

                                                                                                                                                                                                                                                        We can't know how much is about the prompt though and how much is just stochastic randomness in the behavior of that model on that prompt, right? I mean, even given identical prompts, even at temp 0, models don't always behave identically.... at least, as far as I know? Some of the reasons why are I think still a research question, but I think its a fact nonetheless.

                                                                                                                                                                                                                                                        • moffkalast

                                                                                                                                                                                                                                                          yesterday at 7:26 PM

                                                                                                                                                                                                                                                          Kimi seems the only reliable one which is a bit surprising, and GPT 4o is consistently better than GPT 5 which on the other hand is unfortunately not surprising at all.

                                                                                                                                                                                                                                                      • energy123

                                                                                                                                                                                                                                                        yesterday at 7:50 PM

                                                                                                                                                                                                                                                        I sort of assumed they cached like 30 inferences and just repeat them, but maybe I'm being too cynical.

                                                                                                                                                                                                                                                        • ascorbic

                                                                                                                                                                                                                                                          yesterday at 7:06 PM

                                                                                                                                                                                                                                                          The energy usage is minuscule.

                                                                                                                                                                                                                                                            • ugh123

                                                                                                                                                                                                                                                              today at 3:22 AM

                                                                                                                                                                                                                                                              Hmm, curious. How did you come up with that?

                                                                                                                                                                                                                                                              • jdiff

                                                                                                                                                                                                                                                                yesterday at 7:15 PM

                                                                                                                                                                                                                                                                It's wasteful. If someone built a clock out of 47 microservices that called out to 193 APIs to check the current time, location, time zone, and preferred display format we'd rightfully criticize it for similar reasons.

                                                                                                                                                                                                                                                                In a world where Javascript and Electron are still getting (again, rightfully) skewered for inefficiency despite often exceeding the performance of many compiled languages, we should not dismiss the discussion around efficiency so easily.

                                                                                                                                                                                                                                                                  • berkes

                                                                                                                                                                                                                                                                    yesterday at 7:54 PM

                                                                                                                                                                                                                                                                    Yes it is wasteful.

                                                                                                                                                                                                                                                                    But I presume you light up Christmas lights in December, drive to the theater to watch a movie or fire up a campfire on holiday. That too is "wasteful". It's not needed, other, or far more efficient ways exist to achieve the same. And in absolute numbers, far more energy intensive than running an LLM to create 9 clocks every minute. We do things to learn, have fun, be weird, make art, or just spend time.

                                                                                                                                                                                                                                                                    Now, if Rolex starts building watches by running an LLM to drive its production machines or if we replace millions of wall clocks with ones that "Run an LLM every second", then sure, the waste is an actual problem.

                                                                                                                                                                                                                                                                    Point I'm trying to make is that it's OK to consider or debate the energy use of LLMs compared to alternatives. But that bringing up that debate in a context where someone is creative, or having a fun time, its not, IMO. Because a lot of "fun" activities use a lot of energy, and that too isn't automatically "wasteful".

                                                                                                                                                                                                                                                                    • Arisaka1

                                                                                                                                                                                                                                                                      yesterday at 7:22 PM

                                                                                                                                                                                                                                                                      What I find amusing with this argument is that, no one ever brought power savings when e.g. used "let me google that for you" instead of giving someone the answer to their question, because we saw the utility of teaching others how to Google. But apparently we can't see the utility of measuring the oversold competence of current AI models, given sufficiently large sampling size.

                                                                                                                                                                                                                                                                      • saulpw

                                                                                                                                                                                                                                                                        yesterday at 7:37 PM

                                                                                                                                                                                                                                                                        Let's do some math.

                                                                                                                                                                                                                                                                        60x24x30 = 40k AI calls per month per model. Let's suppose there are 1000 output tokens (might it be 10k tokens? Seems like a lot for this task). So 40m tokens per model.

                                                                                                                                                                                                                                                                        The price for 1m output tokens[0] ranges from $.10 (qwen-2.5) to $60 (GPT-4). So $4/mo for the cheapest, and $2.5k/mo for the most expensive.

                                                                                                                                                                                                                                                                        So this might cost several thousand dollars a month? Something smells funny. But you're right, throttling it to once an hour would achieve a similar goal and likely cost less than $100/mo (which is still more than I would spend on a project like this).

                                                                                                                                                                                                                                                                        [0] https://pricepertoken.com/

                                                                                                                                                                                                                                                                          • qwe----3

                                                                                                                                                                                                                                                                            yesterday at 10:28 PM

                                                                                                                                                                                                                                                                            They use 4o (maybe a mini version?)(

                                                                                                                                                                                                                                                            • wanderingmind

                                                                                                                                                                                                                                                              today at 3:35 AM

                                                                                                                                                                                                                                                              The more I look at it, the more I realise the reason for cognitive overload I feel when using LLMs for coding. Same prompt to same model for a pretty straight forward task produces such wildly different outputs. Now, imagine how wildly different the code outputs when trying to generate two different logical functions. The casings are different, commenting is different, no semantic continuity. Now maybe if I give detailed prompts and ask it to follow, it might follow, but from my experience prompt adherence is not so great as well. I am at the stage where I just use LLMs as auto correct, rather than using it for any generation.

                                                                                                                                                                                                                                                              • gwbas1c

                                                                                                                                                                                                                                                                yesterday at 9:55 PM

                                                                                                                                                                                                                                                                Reminds me of the Alzheimer's "draw a clock" test.

                                                                                                                                                                                                                                                                Makes me think that LLMs are like people with dementia! Perhaps it's the best way to relate to an LLM?

                                                                                                                                                                                                                                                                • boxedemp

                                                                                                                                                                                                                                                                  today at 10:33 AM

                                                                                                                                                                                                                                                                  That's super neat. I'll keep checking back to this site as new models are released. It's an interesting benchmark.

                                                                                                                                                                                                                                                                  • yesterday at 7:23 PM

                                                                                                                                                                                                                                                                    • S0y

                                                                                                                                                                                                                                                                      yesterday at 7:22 PM

                                                                                                                                                                                                                                                                      To be fair, This is a deceptively hard task.

                                                                                                                                                                                                                                                                        • bobbylarrybobby

                                                                                                                                                                                                                                                                          yesterday at 7:26 PM

                                                                                                                                                                                                                                                                          Without AI assistance, this should take ~10–15 minutes for a human. Maybe add 5 minutes if you're not allowed to use d3.

                                                                                                                                                                                                                                                                            • alexmorley

                                                                                                                                                                                                                                                                              yesterday at 7:39 PM

                                                                                                                                                                                                                                                                              It's just html/css so no js at all let alone d3.

                                                                                                                                                                                                                                                                              • postalrat

                                                                                                                                                                                                                                                                                yesterday at 8:38 PM

                                                                                                                                                                                                                                                                                Whats your hourly rate? I'll pay you to make as many as you can in a few hours if you share the video.

                                                                                                                                                                                                                                                                        • cornonthecobra

                                                                                                                                                                                                                                                                          yesterday at 9:32 PM

                                                                                                                                                                                                                                                                          I like Deepseek v3.1's idea of radially-aligning each hour number's y-axis ("1" is rotated 30° from vertical, "2" at 60°, etc.). It would be even better if the numbers were rotated anticlockwise.

                                                                                                                                                                                                                                                                          I'm not sure what Qwen 2.5 is doing, but I've seen similar in contemporary art galleries.

                                                                                                                                                                                                                                                                          • paxys

                                                                                                                                                                                                                                                                            yesterday at 8:09 PM

                                                                                                                                                                                                                                                                            Something I'm not able to wrap my head around is that Kimi K2 is the only model that produces a ticking second hand on every attempt while the rest of them are always moving continuously. What fundamental differences in model training or implementation can result in this disparity? Or was this use case programmed in K2 after the fact?

                                                                                                                                                                                                                                                                            • edfletcher_t137

                                                                                                                                                                                                                                                                              yesterday at 11:38 PM

                                                                                                                                                                                                                                                                              Lack of Claude is a glaring oversight given how popular it is as an agentic coding model...

                                                                                                                                                                                                                                                                              • Vera_Wilde

                                                                                                                                                                                                                                                                                today at 7:04 AM

                                                                                                                                                                                                                                                                                It's really beautiful! Super clean UI.

                                                                                                                                                                                                                                                                                The thing I always want from timezone tools is: “Let me simulate a date after one side has shifted but the other hasn’t.”

                                                                                                                                                                                                                                                                                Humans do badly with DST offset transitions; computers do great with them.

                                                                                                                                                                                                                                                                                • Bengalilol

                                                                                                                                                                                                                                                                                  yesterday at 10:51 PM

                                                                                                                                                                                                                                                                                  Qwen doesn't care about clocks, it goes the Dali way, without melting.

                                                                                                                                                                                                                                                                                  It even made a Nietzsche clock (I saw one <body> </body> which was surprisingly empty).

                                                                                                                                                                                                                                                                                  It definitely wins the creative award.

                                                                                                                                                                                                                                                                                  • yesterday at 9:20 PM

                                                                                                                                                                                                                                                                                    • chaosprint

                                                                                                                                                                                                                                                                                      yesterday at 11:51 PM

                                                                                                                                                                                                                                                                                      This is such a great idea! Surprisingly, the Kimi K2 is the only one without any obvious problems. And it is even not the complete K2 thinking version? This made me reread this article from a few days ago:

                                                                                                                                                                                                                                                                                      https://entropytown.com/articles/2025-11-07-kimi-k2-thinking...

                                                                                                                                                                                                                                                                                      • Zeraous

                                                                                                                                                                                                                                                                                        today at 12:18 PM

                                                                                                                                                                                                                                                                                        How Kımı is better than other BILLION$ companys is really fun

                                                                                                                                                                                                                                                                                        • earth2mars

                                                                                                                                                                                                                                                                                          yesterday at 7:59 PM

                                                                                                                                                                                                                                                                                          https://gemini.google.com/share/00967146a995 works perfectly fine with gemini 2.5 pro

                                                                                                                                                                                                                                                                                            • lanewinfield

                                                                                                                                                                                                                                                                                              yesterday at 8:02 PM

                                                                                                                                                                                                                                                                                              nice. I restrict to 2000 tokens for mine, how many was that?

                                                                                                                                                                                                                                                                                              • esafak

                                                                                                                                                                                                                                                                                                yesterday at 9:37 PM

                                                                                                                                                                                                                                                                                                how do you do that?

                                                                                                                                                                                                                                                                                                  • agildehaus

                                                                                                                                                                                                                                                                                                    today at 3:42 AM

                                                                                                                                                                                                                                                                                                    I'm assuming the "Gemini 2.5" referenced on this site is Flash, not Pro. Pro is insane, and 3.0 is just around the corner.

                                                                                                                                                                                                                                                                                                    • earth2mars

                                                                                                                                                                                                                                                                                                      today at 12:41 AM

                                                                                                                                                                                                                                                                                                      I used exactly the same prompt this site uses. Nothing else.

                                                                                                                                                                                                                                                                                              • anonzzzies

                                                                                                                                                                                                                                                                                                today at 12:55 AM

                                                                                                                                                                                                                                                                                                Sonnet 4.5 does it flawless. Tried 8 times.

                                                                                                                                                                                                                                                                                                • ticulatedspline

                                                                                                                                                                                                                                                                                                  yesterday at 8:53 PM

                                                                                                                                                                                                                                                                                                  This is cool, interesting to see how consistent some models are (both in success and failure)

                                                                                                                                                                                                                                                                                                  I tried gpt-oss-20b (my go-to local) and it looks ok though not very accurate. It decided to omit numbers. It also took 4500 tokens while thinking.

                                                                                                                                                                                                                                                                                                  I'd be interested in seeing it with some more token leeway as well as comparing two or more similar prompts. like using "current time" instead of "${time}" and being more prescriptive about including numbers

                                                                                                                                                                                                                                                                                                  • collimarco

                                                                                                                                                                                                                                                                                                    yesterday at 7:25 PM

                                                                                                                                                                                                                                                                                                    In any case those clocks are all extremely inaccurate, even if AI could build a decent UI (which is not the case).

                                                                                                                                                                                                                                                                                                    Some months ago I published this site for fun: https://timeutc.com There's a lot of code involved to make it precise to the ms, including adjusting based on network delay, frame refresh rate instead of using setTimeout and much more. If you are curious take a look at the source code.

                                                                                                                                                                                                                                                                                                    • 3oil3

                                                                                                                                                                                                                                                                                                      today at 5:47 AM

                                                                                                                                                                                                                                                                                                      I wonder which model will silently be updated and suddenly start drawing clocks with Audemars-Piguet-level kind of complications.

                                                                                                                                                                                                                                                                                                      • amelius

                                                                                                                                                                                                                                                                                                        yesterday at 8:35 PM

                                                                                                                                                                                                                                                                                                        Maybe they can ask Sora to make variations of:

                                                                                                                                                                                                                                                                                                        https://slate.com/human-interest/2016/07/martin-baas-giant-r...

                                                                                                                                                                                                                                                                                                        • shahzaibmushtaq

                                                                                                                                                                                                                                                                                                          today at 6:13 AM

                                                                                                                                                                                                                                                                                                          Interesting idea!

                                                                                                                                                                                                                                                                                                          Why is a new clock being rendered every minute? Or AI models are evolving and improving every minute.

                                                                                                                                                                                                                                                                                                          • bwhiting2356

                                                                                                                                                                                                                                                                                                            today at 4:10 AM

                                                                                                                                                                                                                                                                                                            You should render it, show an image to the model and allow it to iterate. No person has to one-shot code without seeing what it looks like.

                                                                                                                                                                                                                                                                                                            • rtcode_io

                                                                                                                                                                                                                                                                                                              yesterday at 9:41 PM

                                                                                                                                                                                                                                                                                                              See https://clock.rt.ht/::code

                                                                                                                                                                                                                                                                                                              AI-optimized <analog-clock>!

                                                                                                                                                                                                                                                                                                              People expect perfection on first attempt. This took a brief joint session:

                                                                                                                                                                                                                                                                                                              HI: define the custom element API design (attribute/property behavior) and the CSS parts

                                                                                                                                                                                                                                                                                                              AI: draw the rest of the f… owl

                                                                                                                                                                                                                                                                                                                • speedgoose

                                                                                                                                                                                                                                                                                                                  today at 11:39 AM

                                                                                                                                                                                                                                                                                                                  This is a white page, am I missing something?

                                                                                                                                                                                                                                                                                                              • baidoct

                                                                                                                                                                                                                                                                                                                today at 11:45 AM

                                                                                                                                                                                                                                                                                                                GPT-5 looks broken

                                                                                                                                                                                                                                                                                                                • wewtyflakes

                                                                                                                                                                                                                                                                                                                  today at 4:33 AM

                                                                                                                                                                                                                                                                                                                  It is funny to see the performance improve across many of the models, somewhat miraculously, throughout the day today.

                                                                                                                                                                                                                                                                                                                  • nasir

                                                                                                                                                                                                                                                                                                                    yesterday at 8:53 PM

                                                                                                                                                                                                                                                                                                                    where's opus/sonnet! very curious on that!

                                                                                                                                                                                                                                                                                                                    • syx

                                                                                                                                                                                                                                                                                                                      yesterday at 6:57 PM

                                                                                                                                                                                                                                                                                                                      I’m very curious about the monthly bill for such a creative project, surely some of these are pre rendered?

                                                                                                                                                                                                                                                                                                                        • coffeecoders

                                                                                                                                                                                                                                                                                                                          yesterday at 7:22 PM

                                                                                                                                                                                                                                                                                                                          Napkin math:

                                                                                                                                                                                                                                                                                                                          9 AIs × 43,200 minutes = 388,800 requests/month

                                                                                                                                                                                                                                                                                                                          388,800 requests × 200 tokens = 77,760,000 tokens/month ≈ 78M tokens

                                                                                                                                                                                                                                                                                                                          Cost varies from 10 cents to $1 per 1M tokens.

                                                                                                                                                                                                                                                                                                                          Using the mid-price, the cost is around $50/month.

                                                                                                                                                                                                                                                                                                                          ---

                                                                                                                                                                                                                                                                                                                          Hopefully, the OP has this endpoint protected - https://clocks.brianmoore.com/api/clocks?time=11:19AM

                                                                                                                                                                                                                                                                                                                            • whimsicalism

                                                                                                                                                                                                                                                                                                                              yesterday at 8:40 PM

                                                                                                                                                                                                                                                                                                                              i think it is cached on the minute level, responses cannot be that fast

                                                                                                                                                                                                                                                                                                                      • josfredo

                                                                                                                                                                                                                                                                                                                        today at 4:54 AM

                                                                                                                                                                                                                                                                                                                        Watching these gives me a strong feeling of unease. Art-wise, it is a very beautiful project.

                                                                                                                                                                                                                                                                                                                        • whimsicalism

                                                                                                                                                                                                                                                                                                                          yesterday at 8:39 PM

                                                                                                                                                                                                                                                                                                                          Kimi K2 is obviously the best, but gpt-5 has the most gorgeous ones when it works

                                                                                                                                                                                                                                                                                                                          • orly01

                                                                                                                                                                                                                                                                                                                            yesterday at 8:41 PM

                                                                                                                                                                                                                                                                                                                            What does it mean that each model is allowed 2000 tokens to generate its clock?

                                                                                                                                                                                                                                                                                                                            • bigbluedots

                                                                                                                                                                                                                                                                                                                              today at 12:45 AM

                                                                                                                                                                                                                                                                                                                              I just realized I'm running late, it's almost -2!

                                                                                                                                                                                                                                                                                                                              More seriously, I'd love to see how the models perform the same task with a larger token allowance.

                                                                                                                                                                                                                                                                                                                              • kfarr

                                                                                                                                                                                                                                                                                                                                yesterday at 6:50 PM

                                                                                                                                                                                                                                                                                                                                Add some voting and you got yourself an AI World Clock arena! https://artificialanalysis.ai/image/arena

                                                                                                                                                                                                                                                                                                                                  • BrandoElFollito

                                                                                                                                                                                                                                                                                                                                    yesterday at 8:27 PM

                                                                                                                                                                                                                                                                                                                                    Thank you very much.... It was a fun game until I got to the prompt

                                                                                                                                                                                                                                                                                                                                    Place a baby elephant in the green chair

                                                                                                                                                                                                                                                                                                                                    I cannot unsee what I saw and it is 21:30 here so I have an hour or so to eliminate the picture from my mind or I will have nightmares.

                                                                                                                                                                                                                                                                                                                                • hansmayer

                                                                                                                                                                                                                                                                                                                                  yesterday at 9:18 PM

                                                                                                                                                                                                                                                                                                                                  Very funny. It seems the Qwen generates the funniest outputs :)

                                                                                                                                                                                                                                                                                                                                    • csours

                                                                                                                                                                                                                                                                                                                                      yesterday at 9:44 PM

                                                                                                                                                                                                                                                                                                                                      Oh, Qwen, buddy, you sure are TRYING

                                                                                                                                                                                                                                                                                                                                  • fschuett

                                                                                                                                                                                                                                                                                                                                    yesterday at 7:22 PM

                                                                                                                                                                                                                                                                                                                                    Reminds me of this: https://www.youtube.com/watch?v=OGbhJjXl9Rk

                                                                                                                                                                                                                                                                                                                                    • aavshr

                                                                                                                                                                                                                                                                                                                                      yesterday at 8:14 PM

                                                                                                                                                                                                                                                                                                                                      just curious, why not the sonnet models? In my personal experience, Anthropic's Sonnet models are the best when it comes to things like this!

                                                                                                                                                                                                                                                                                                                                      • xyproto

                                                                                                                                                                                                                                                                                                                                        yesterday at 8:17 PM

                                                                                                                                                                                                                                                                                                                                        Try adding to the prompt that it has a PhD in Computer Science and have many methods for dealing with complexity.

                                                                                                                                                                                                                                                                                                                                        This gives better results, at least for me.

                                                                                                                                                                                                                                                                                                                                          • bigfishrunning

                                                                                                                                                                                                                                                                                                                                            yesterday at 9:41 PM

                                                                                                                                                                                                                                                                                                                                            Why does that give better results? Is this phenomena measurable? How would "you have a phd in computer science" change its ability to interpret prose? Every interaction with an LLM seems like superstition.

                                                                                                                                                                                                                                                                                                                                        • yesterday at 7:58 PM

                                                                                                                                                                                                                                                                                                                                          • maxdo

                                                                                                                                                                                                                                                                                                                                            yesterday at 10:44 PM

                                                                                                                                                                                                                                                                                                                                            Selection of western models is weird no gpt-5.1 , opus 4.1 ( nailed it perfectly ) Something I quickly tested

                                                                                                                                                                                                                                                                                                                                            • warpspin

                                                                                                                                                                                                                                                                                                                                              today at 12:55 PM

                                                                                                                                                                                                                                                                                                                                              Lol. This is supposed to replace me at my job already?

                                                                                                                                                                                                                                                                                                                                              Great experiment!

                                                                                                                                                                                                                                                                                                                                              • bongodongobob

                                                                                                                                                                                                                                                                                                                                                yesterday at 8:28 PM

                                                                                                                                                                                                                                                                                                                                                Weird. Sonnet 4.5 one shotted it with:

                                                                                                                                                                                                                                                                                                                                                Create an interactive artifact of an analog clock face that keeps time properly.

                                                                                                                                                                                                                                                                                                                                                https://claude.ai/public/artifacts/75daae76-3621-4c47-a684-d...

                                                                                                                                                                                                                                                                                                                                                • stym06

                                                                                                                                                                                                                                                                                                                                                  today at 4:45 AM

                                                                                                                                                                                                                                                                                                                                                  If a human had done this, these would be at a museum

                                                                                                                                                                                                                                                                                                                                                  • yesterday at 8:05 PM

                                                                                                                                                                                                                                                                                                                                                    • yesterday at 8:39 PM

                                                                                                                                                                                                                                                                                                                                                      • esotericwarfare

                                                                                                                                                                                                                                                                                                                                                        today at 12:33 AM

                                                                                                                                                                                                                                                                                                                                                        This is an AD for Kimi K2

                                                                                                                                                                                                                                                                                                                                                        • __fst__

                                                                                                                                                                                                                                                                                                                                                          yesterday at 10:27 PM

                                                                                                                                                                                                                                                                                                                                                          This is why we need TeraWatt DCs, to generate code for world clocks every minute.

                                                                                                                                                                                                                                                                                                                                                          • HarHarVeryFunny

                                                                                                                                                                                                                                                                                                                                                            yesterday at 10:56 PM

                                                                                                                                                                                                                                                                                                                                                            Looks like we've got a new Turing test here: "draw me a clock"

                                                                                                                                                                                                                                                                                                                                                            • ada1981

                                                                                                                                                                                                                                                                                                                                                              yesterday at 11:29 PM

                                                                                                                                                                                                                                                                                                                                                              Sonnet 4.5 did this easily https://claude.ai/public/artifacts/c1bb5d57-573b-49e0-9539-7...

                                                                                                                                                                                                                                                                                                                                                              • bigbluedots

                                                                                                                                                                                                                                                                                                                                                                today at 12:50 AM

                                                                                                                                                                                                                                                                                                                                                                Is there a "draw a pelican riding a bicycle" version?

                                                                                                                                                                                                                                                                                                                                                              • zkmon

                                                                                                                                                                                                                                                                                                                                                                yesterday at 7:24 PM

                                                                                                                                                                                                                                                                                                                                                                Was Claude banned from this Olympics?

                                                                                                                                                                                                                                                                                                                                                                  • giancarlostoro

                                                                                                                                                                                                                                                                                                                                                                    yesterday at 8:04 PM

                                                                                                                                                                                                                                                                                                                                                                    Haiku is the lightweight Claude model, I'm not sure why they picked the weaker model.

                                                                                                                                                                                                                                                                                                                                                                • abathologist

                                                                                                                                                                                                                                                                                                                                                                  yesterday at 7:12 PM

                                                                                                                                                                                                                                                                                                                                                                  This is great. If you think that the phenomena of human-like text generation evinces human-like intelligence, then this should be taken to evince that the systems likely have dementia. https://en.wikipedia.org/wiki/Montreal_Cognitive_Assessment

                                                                                                                                                                                                                                                                                                                                                                    • AIorNot

                                                                                                                                                                                                                                                                                                                                                                      yesterday at 7:20 PM

                                                                                                                                                                                                                                                                                                                                                                      Imagine if I asked you to draw as pixels and operate a clock via html or create a jpeg with a pencil and paper and have it be accurate.. I suspect your handcoded work to be off by an order of magnitutde compared

                                                                                                                                                                                                                                                                                                                                                                  • accrual

                                                                                                                                                                                                                                                                                                                                                                    yesterday at 11:07 PM

                                                                                                                                                                                                                                                                                                                                                                    I love that GPT-5 is putting the clock hands way outside the frame and just generally is a mess. Maybe we'll look back on these mistakes just like watching kids grow up and fumble basic tasks. Humorous in its own unique way.

                                                                                                                                                                                                                                                                                                                                                                      • palmotea

                                                                                                                                                                                                                                                                                                                                                                        yesterday at 11:10 PM

                                                                                                                                                                                                                                                                                                                                                                        > Maybe we'll look back on these hilarious mistakes just like watching kids grow up and fumble basic tasks.

                                                                                                                                                                                                                                                                                                                                                                        Or regret: "why didn't we stop it when we could?"

                                                                                                                                                                                                                                                                                                                                                                    • Imanari

                                                                                                                                                                                                                                                                                                                                                                      yesterday at 9:22 PM

                                                                                                                                                                                                                                                                                                                                                                      Qwens clocks are hilarious

                                                                                                                                                                                                                                                                                                                                                                      • Waterluvian

                                                                                                                                                                                                                                                                                                                                                                        yesterday at 8:44 PM

                                                                                                                                                                                                                                                                                                                                                                        How do they do time without JavaScript? Is there an API I’m not aware of?

                                                                                                                                                                                                                                                                                                                                                                          • bloppe

                                                                                                                                                                                                                                                                                                                                                                            yesterday at 8:46 PM

                                                                                                                                                                                                                                                                                                                                                                            CSS animation. It's not the real time. Just a hypothetical time.

                                                                                                                                                                                                                                                                                                                                                                              • Waterluvian

                                                                                                                                                                                                                                                                                                                                                                                yesterday at 8:51 PM

                                                                                                                                                                                                                                                                                                                                                                                I’m imagining some must be using JS because I’m seeing (rarely…) times that are perfectly correct.

                                                                                                                                                                                                                                                                                                                                                                                  • bloppe

                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 9:04 PM

                                                                                                                                                                                                                                                                                                                                                                                    Actually you're right. If you view source, you can see `const response = await fetch(`/api/clocks?time=${encodeURIComponent(localTime)}`);`. I'm not sure how that API works, but it's definitely reading the current time using JS, then somehow embedding it in the HTML / CSS of each LLM.

                                                                                                                                                                                                                                                                                                                                                                                    • vultour

                                                                                                                                                                                                                                                                                                                                                                                      today at 12:13 AM

                                                                                                                                                                                                                                                                                                                                                                                      It's crafted with a prompt that gives the AI the current time, then it simply refreshes every minute so the seconds start at zero correctly.

                                                                                                                                                                                                                                                                                                                                                                              • bhandziuk

                                                                                                                                                                                                                                                                                                                                                                                yesterday at 9:07 PM

                                                                                                                                                                                                                                                                                                                                                                                Looks like css keyframes

                                                                                                                                                                                                                                                                                                                                                                            • busymom0

                                                                                                                                                                                                                                                                                                                                                                              yesterday at 7:12 PM

                                                                                                                                                                                                                                                                                                                                                                              Because a new clock is generated every minute, looks like simply changing the time by a digit causes the result to be significantly different from the previous iteration.

                                                                                                                                                                                                                                                                                                                                                                              • kwanbix

                                                                                                                                                                                                                                                                                                                                                                                yesterday at 9:06 PM

                                                                                                                                                                                                                                                                                                                                                                                What a waste of energy.

                                                                                                                                                                                                                                                                                                                                                                                • 0xCE0

                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 9:44 PM

                                                                                                                                                                                                                                                                                                                                                                                  Seems like Will's clock drawing test in Hannibal :)

                                                                                                                                                                                                                                                                                                                                                                                  • woopwoop

                                                                                                                                                                                                                                                                                                                                                                                    today at 4:52 AM

                                                                                                                                                                                                                                                                                                                                                                                    The qwen clocks are art.

                                                                                                                                                                                                                                                                                                                                                                                    • ssl-3

                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 8:51 PM

                                                                                                                                                                                                                                                                                                                                                                                      This really needs to be an xscreensaver hack.

                                                                                                                                                                                                                                                                                                                                                                                      • JamesAdir

                                                                                                                                                                                                                                                                                                                                                                                        today at 8:11 AM

                                                                                                                                                                                                                                                                                                                                                                                        I believe that in a day or two, the companies will address this and it would be solved by them for that use case

                                                                                                                                                                                                                                                                                                                                                                                        • gloosx

                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 9:35 PM

                                                                                                                                                                                                                                                                                                                                                                                          anyone tried opening this from mobile? not a single clock renders correctly, almost looks like a joke on LLMs

                                                                                                                                                                                                                                                                                                                                                                                          • jcmontx

                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 8:42 PM

                                                                                                                                                                                                                                                                                                                                                                                            Grok is impressive, I should give it a shot

                                                                                                                                                                                                                                                                                                                                                                                            • AlfredBarnes

                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 7:05 PM

                                                                                                                                                                                                                                                                                                                                                                                              Its cool to see them get it right .....sometimes

                                                                                                                                                                                                                                                                                                                                                                                              • miohtama

                                                                                                                                                                                                                                                                                                                                                                                                today at 12:36 AM

                                                                                                                                                                                                                                                                                                                                                                                                The new Turing time test

                                                                                                                                                                                                                                                                                                                                                                                                • yesterday at 8:48 PM

                                                                                                                                                                                                                                                                                                                                                                                                  • hollow-moe

                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 9:57 PM

                                                                                                                                                                                                                                                                                                                                                                                                    obviously they're all broken on firefox, no one uses firefox anyways

                                                                                                                                                                                                                                                                                                                                                                                                    • mstipetic

                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 7:26 PM

                                                                                                                                                                                                                                                                                                                                                                                                      GPT-5 is embarrassing itself. Kimi and DeepSeek are very consistently good. Wild that you can just download these models.

                                                                                                                                                                                                                                                                                                                                                                                                      • bananatron

                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 7:02 PM

                                                                                                                                                                                                                                                                                                                                                                                                        grok's looks like one of those clocks you'd find at a novelty shop

                                                                                                                                                                                                                                                                                                                                                                                                        • shubham_zingle

                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 7:27 PM

                                                                                                                                                                                                                                                                                                                                                                                                          not sure about the accuracy though, although shooting in the dark

                                                                                                                                                                                                                                                                                                                                                                                                          • lxe

                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 7:17 PM

                                                                                                                                                                                                                                                                                                                                                                                                            Honestly, I think if you track the performance of each over time, since these get regenerated once in a while, you can then have a very, very useful and cohesive benchmark.

                                                                                                                                                                                                                                                                                                                                                                                                            • cyberjill

                                                                                                                                                                                                                                                                                                                                                                                                              today at 3:12 AM

                                                                                                                                                                                                                                                                                                                                                                                                              666

                                                                                                                                                                                                                                                                                                                                                                                                              • larodi

                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 7:02 PM

                                                                                                                                                                                                                                                                                                                                                                                                                would be gr8t to also see the prompt this was done with

                                                                                                                                                                                                                                                                                                                                                                                                                  • creade

                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 7:04 PM

                                                                                                                                                                                                                                                                                                                                                                                                                    The ? has "Create HTML/CSS of an analog clock showing ${time}. Include numbers (or numerals) if you wish, and have a CSS animated second hand. Make it responsive and use a white background. Return ONLY the HTML/CSS code with no markdown formatting."

                                                                                                                                                                                                                                                                                                                                                                                                                • imchillyb

                                                                                                                                                                                                                                                                                                                                                                                                                  today at 2:12 AM

                                                                                                                                                                                                                                                                                                                                                                                                                  I love qwen, it tries so hard with its little paddle and never gets anywhere.

                                                                                                                                                                                                                                                                                                                                                                                                                  • 1yvino

                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 7:18 PM

                                                                                                                                                                                                                                                                                                                                                                                                                    i wonder kwen prompt woud look like hallucination?

                                                                                                                                                                                                                                                                                                                                                                                                                    • bitwize

                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 11:01 PM

                                                                                                                                                                                                                                                                                                                                                                                                                      I'm reminded of the "draw a clock" test neurologists use to screen for dementia and brain damage.

                                                                                                                                                                                                                                                                                                                                                                                                                      • teaearlgraycold

                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 10:28 PM

                                                                                                                                                                                                                                                                                                                                                                                                                        Qwen 2.5 doing a surprisingly good job (as of right now).

                                                                                                                                                                                                                                                                                                                                                                                                                        • DeathArrow

                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 9:43 PM

                                                                                                                                                                                                                                                                                                                                                                                                                          How can Deepseek and Kimi get it right while Haiku, Gemini and GPT are making a mess?

                                                                                                                                                                                                                                                                                                                                                                                                                          • eastbound

                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 8:24 PM

                                                                                                                                                                                                                                                                                                                                                                                                                            Security-wise, this is a website that takes the straight output of AI and serves it for execution on their website.

                                                                                                                                                                                                                                                                                                                                                                                                                            I know, developers do the same, but at least they check it in Git to notice their mistakes. Here is an opportunity for AI to call a Google Authentication on you, or anything else.

                                                                                                                                                                                                                                                                                                                                                                                                                            • bpt3

                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 8:21 PM

                                                                                                                                                                                                                                                                                                                                                                                                                              It's wild how much the output varies for the same model for each run.

                                                                                                                                                                                                                                                                                                                                                                                                                              I'm not sure if this was the intent or not, but it sure highlights how unreliable LLMs are.

                                                                                                                                                                                                                                                                                                                                                                                                                              • novemp

                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 7:54 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                Oh cool, it's the schizophrenia clock-drawing test but for AI.

                                                                                                                                                                                                                                                                                                                                                                                                                                • system2

                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 7:40 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                  Ask Claude or ChatGPT to write it in Python, and you will see what they are capable of. HTML + CSS has never been the strong suit of any of these models.

                                                                                                                                                                                                                                                                                                                                                                                                                                    • camalouu

                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 8:13 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                      Claude generates some js/css stuff even when i don't ask for it. I think Claude itself at least believes he is good at this.

                                                                                                                                                                                                                                                                                                                                                                                                                                  • shevy-java

                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 7:16 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                    Now that is actually creative.

                                                                                                                                                                                                                                                                                                                                                                                                                                    Granted, it is not a clock - but it could be art. It looks like a Picasso. When he was drunk. And took some LSD.

                                                                                                                                                                                                                                                                                                                                                                                                                                    • jonplackett

                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 7:12 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                      kimi is kicking ass

                                                                                                                                                                                                                                                                                                                                                                                                                                      • fnord77

                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 2:05 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                        whatever model Cursor uses was telling me the date was March 12, 2023

                                                                                                                                                                                                                                                                                                                                                                                                                                        • surfingdino

                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 9:08 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                          What a wonderfully visual example of the crap LLMs turn everything into. I am eagerly awaiting the collapse of the LLM bubble. JetBrains added this crap to their otherwise fine series of IDEs and now I have to keep removing randomly inserted import statements and keep fixing hallucinated names of functions suggested instead of the names of functions that I have already defined in the same file. Lack of determinism where we expect it (most of the things we do, tbh) is creating more problems than it is solving.

                                                                                                                                                                                                                                                                                                                                                                                                                                          • jsmo

                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 6:05 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                            lol

                                                                                                                                                                                                                                                                                                                                                                                                                                            • 10/04/2025

                                                                                                                                                                                                                                                                                                                                                                                                                                              • Gormanu

                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 7:02 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                • superlukas99

                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 3:28 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                  [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                  • PeterStuer

                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 6:59 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                    Why? This is diagonal to how LLM's work, and trivially solved by a minimal hybrid front/sub system.

                                                                                                                                                                                                                                                                                                                                                                                                                                                      • bayindirh

                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 7:03 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                        Because, LLMs are touted to be the silver bullet of silver bullets. Built upon world's knowledge, and with the capacity to call upon updated information with agents, they are ought to rival the top programmers 3 days ago.

                                                                                                                                                                                                                                                                                                                                                                                                                                                          • awkwam

                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 7:29 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                            They might be touted like that but it seems like you don't understand how they work. The example in the article shows that the prompt is limiting the LLM by giving it access to only 2000 tokens and also saying "ONLY OUTPUT ...". This is like me asking you to solve the same problem but forcing you do de-activate half of your brain + forget any programming experience you have. It's just stupid.

                                                                                                                                                                                                                                                                                                                                                                                                                                                              • bayindirh

                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 7:39 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                > like you don't understand how they work.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                I would not make such assumptions.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                > The example in the article shows that the prompt is limiting the LLM by giving it access to only 2000 tokens and also saying "ONLY OUTPUT ..."

                                                                                                                                                                                                                                                                                                                                                                                                                                                                The site is pretty simple, method is pretty straightforward. If you believe this is unfair, you can always build one yourself.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                > It's just stupid.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                No, it's a great way of testing things within constraints.

                                                                                                                                                                                                                                                                                                                                                                                                                                                        • em3rgent0rdr

                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 7:02 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                          To gauge.

                                                                                                                                                                                                                                                                                                                                                                                                                                                      • awkwam

                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 7:37 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                        Limiting the model to only use 2000 tokens while also asking it to output ONLY HTML/CSS is just stupid. It's like asking a programmer to perform the same task while removing half their brain and also forget about their programming experience. This is a stupid and meaningless benchmark.

                                                                                                                                                                                                                                                                                                                                                                                                                                                        • kburman

                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 7:17 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                          These types of tests are fundamentally flawed. I was able to create perfect clock using gemini 2.5 pro - https://gemini.google.com/share/136f07a0fa78

                                                                                                                                                                                                                                                                                                                                                                                                                                                            • Drew_

                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 7:25 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                              The website is regenerating the clocks every minute. When I opened it, Gemini 2.5 was the only working one. Now, they are all broken.

                                                                                                                                                                                                                                                                                                                                                                                                                                                              Also, your example is not showing the current time.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                • system2

                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 7:42 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                  It wouldn't be hard to tell to pick up browser time as the default start point. Just a piece of prompt.

                                                                                                                                                                                                                                                                                                                                                                                                                                                              • yesterday at 7:35 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                • dwringer

                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 7:29 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Even Gemini Flash did really well for me[0] using two prompts - the initial query and one to fix the only error I could identify.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                  > Please generate an analog clock widget, synchronized to actual system time, with hands that update in real time and a second hand that ticks at least once per second. Make sure all the hour markings are visible and put some effort into making a modern, stylish clock face.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Followed by:

                                                                                                                                                                                                                                                                                                                                                                                                                                                                  > Currently the hands are working perfectly but they're translated incorrectly making then uncentered. Can you ensure that each one is translated to the correct position on the clock face?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                  [0] https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • allenu

                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 7:26 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I don't think this is a serious test. It's just an art piece to contrast different LLMs taking on the same task, and against themselves since it updates every minute. One minute one of the results was really good for me and the next minute it was very, very bad.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • jmdeon

                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 7:21 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Aren't they attempting to also display current time though? Your share is a clock starting at midnight/noon. Kimi K2 seems to be the best on each refresh.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • sinak

                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 7:17 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                        How are they flawed?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • earthnail

                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 7:19 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                            The results are not reproducable, as evidenced by parent poster.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • micromacrofoot

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 7:27 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                isn't that kind of the point of non-determinism?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • earthnail

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 10:49 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    No. Good nondeterministic models reproducibly generate equally desirable output - not identical output, but interchangeable.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • micromacrofoot

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 11:33 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        oh I see, thank you for clarifying