Tiled Hacker news on React Router

AI World Clocks

1132 points - yesterday at 6:35 PM

"Every minute, a new clock is rendered by nine different AI models."

Source

lanewinfield
yesterday at 7:59 PM
hi, I made this. thank you for posting.
I love clocks and I love finding the edges of what any given technology is capable of.
I've watched this for many hours and Kimi frequently gets the most accurate clock but also the least variation and is most boring. Qwen is often times the most insane and makes me laugh. Which one is "better?"
otterley
yesterday at 7:57 PM
Watching this over the past few minutes, it looks like Kimi K2 generates the best clock face most consistently. I'd never heard of that model before today!
Qwen 2.5's clocks, on the other hand, look like they never make it out of the womb.
baltimore
yesterday at 7:08 PM
Since the first (good) image generation models became available, I've been trying to get them to generate an image of a clock with 13 instead of the usual 12 hour divisions. I have not been successful. Usually they will just replace the "12" with a "13" and/or mess up the clock face in some other way.
I'd be interested if anyone else is successful. Share how you did it!
ryandrake
yesterday at 8:03 PM
I've been struggling all week trying to get Claude Code to write code to produce visual (not the usual, verifiable, text on a terminal) output in the form of a SDL_GPU rendered scene consisting of the usual things like shaders, pipelines, buffers, textures and samplers, vertex and index data and so on, and boy it just doesn't seem to know what it's doing. Despite providing paragraphs-long, detailed prompts. Despite describing each uniform and each matrix that needs to be sent. Despite giving it extremely detailed guidance about what order things need to be done in. It would have been faster for me to just write the code myself.
When it fails a couple of times it will try to put logging in place and then confidently tell me things like "The vertex data has been sent to the renderer, therefore the output is correct!" When I suggest it take a screenshot of the output each time to verify correctness, it does, and then declares victory over an entirely incorrect screenshot. When I suggest it write unit tests, it does so, but the tests are worthless and only tests that the incorrect code it wrote is always incorrect in the same ways.
When it fails even more times, it will get into this what I like to call "intern engineer" mode where it just tries random things that I know are not going to work. And if I let it keep going, it will end up modifying the entire source tree with random "try this" crap. And each iteration, it confidently tells me: "Perfect! I have found the root cause! It is [garbage bullshit]. I have corrected it and the code is now completely working!"
These tools are cute, but they really need to go a long way before they are actually useful for anything more than trivial toy projects.
munro
yesterday at 7:50 PM
Amazing, some people are so enamored with LLMs who use them for soft outcomes, and disagree with me when I say be careful they're not perfect -- this is such a great non technical way to explain the reality I'm seeing when using on hard outcome coding/logic tasks. "Hey this test is failing", LLM deletes test, "FIXED!"
kylecazar
yesterday at 9:58 PM
Non-determinism at it's finest. The clock is perfect, the refresh happens, the clock looks like a Dali painting.
anon_cow1111
yesterday at 11:17 PM
I'm having a hard time believing this site is honest, especially with how ridiculous the scaling and rotation of numbers is for most of them. I dumped his prompt into chatgpt to try it myself and it did create a very neat clock face with the numbers at the correct position+animated second hand, it just got the exact time wrong, being a few hours off.
Edit: the time may actually have been perfect now that I account for my isp's geo-located time zone
porphyra
yesterday at 8:58 PM
LLMs can't "look" at the rendered HTML output to see if what they generated makes sense or not. But there ought to be a way to do that right? To let the model iterate until what it generates looks right.
Currently, at work, I'm using Cursor for something that has an OpenGL visualization program. It's incredibly frustrating trying to describe bugs to the AI because it is completely blind. Like I just wanna tell it "there's no line connecting these two points but there ought to be one!" or "your polygon is obviously malformed as it is missing a bunch of points and intersects itself" but it's impossible. I end up having to make the AI add debug prints to, say, print out the position of each vertex, in order to convince it that it has a bug. Very high friction and annoying!!!
zkmon
yesterday at 7:07 PM
Why are Deepseek and Kimi are beating other models by so much margin? Is this to do with their specialization for this task?
anotheryou
today at 10:22 AM
Claude Sonnet 4.5 with a little thinking: https://imgur.com/a/zcJOnKy
no thinking: better clock but not current time (the prompt is confusing here though): https://imgur.com/a/kRK3Q18
mandolingual
yesterday at 9:10 PM
Always interesting/uncanny when AI is tested with human cognitive tests https://www.psychdb.com/cognitive-testing/clock-drawing-test.
em3rgent0rdr
yesterday at 6:59 PM
Most look like they were done by a beginner programmer on crack, but every once in a while a correct one appears.
arendtio
today at 10:31 AM
Pretty cool already!
I use 'Sonnet 4.5 thinking' and 'Composer 1' (Cursor) the most, so it would be interesting to see how such SOTA models perform in this task.
ugh123
yesterday at 6:59 PM
Cool, and marginally informative on the current state of things. but kind of a waste of energy given everything is re-done every minute to compare. We'd probably only need a handful of each to see the meaningful differences.
wanderingmind
today at 3:35 AM
The more I look at it, the more I realise the reason for cognitive overload I feel when using LLMs for coding. Same prompt to same model for a pretty straight forward task produces such wildly different outputs. Now, imagine how wildly different the code outputs when trying to generate two different logical functions. The casings are different, commenting is different, no semantic continuity. Now maybe if I give detailed prompts and ask it to follow, it might follow, but from my experience prompt adherence is not so great as well. I am at the stage where I just use LLMs as auto correct, rather than using it for any generation.
gwbas1c
yesterday at 9:55 PM
Reminds me of the Alzheimer's "draw a clock" test.
Makes me think that LLMs are like people with dementia! Perhaps it's the best way to relate to an LLM?
boxedemp
today at 10:33 AM
That's super neat. I'll keep checking back to this site as new models are released. It's an interesting benchmark.
yesterday at 7:23 PM
S0y
yesterday at 7:22 PM
To be fair, This is a deceptively hard task.
cornonthecobra
yesterday at 9:32 PM
I like Deepseek v3.1's idea of radially-aligning each hour number's y-axis ("1" is rotated 30° from vertical, "2" at 60°, etc.). It would be even better if the numbers were rotated anticlockwise.
I'm not sure what Qwen 2.5 is doing, but I've seen similar in contemporary art galleries.
paxys
yesterday at 8:09 PM
Something I'm not able to wrap my head around is that Kimi K2 is the only model that produces a ticking second hand on every attempt while the rest of them are always moving continuously. What fundamental differences in model training or implementation can result in this disparity? Or was this use case programmed in K2 after the fact?
edfletcher_t137
yesterday at 11:38 PM
Lack of Claude is a glaring oversight given how popular it is as an agentic coding model...
Vera_Wilde
today at 7:04 AM
It's really beautiful! Super clean UI.
The thing I always want from timezone tools is: “Let me simulate a date after one side has shifted but the other hasn’t.”
Humans do badly with DST offset transitions; computers do great with them.
Bengalilol
yesterday at 10:51 PM
Qwen doesn't care about clocks, it goes the Dali way, without melting.
It even made a Nietzsche clock (I saw one <body> </body> which was surprisingly empty).
It definitely wins the creative award.
yesterday at 9:20 PM
chaosprint
yesterday at 11:51 PM
This is such a great idea! Surprisingly, the Kimi K2 is the only one without any obvious problems. And it is even not the complete K2 thinking version? This made me reread this article from a few days ago:
https://entropytown.com/articles/2025-11-07-kimi-k2-thinking...
Zeraous
today at 12:18 PM
How Kımı is better than other BILLION$ companys is really fun
earth2mars
yesterday at 7:59 PM
https://gemini.google.com/share/00967146a995 works perfectly fine with gemini 2.5 pro
anonzzzies
today at 12:55 AM
Sonnet 4.5 does it flawless. Tried 8 times.
ticulatedspline
yesterday at 8:53 PM
This is cool, interesting to see how consistent some models are (both in success and failure)
I tried gpt-oss-20b (my go-to local) and it looks ok though not very accurate. It decided to omit numbers. It also took 4500 tokens while thinking.
I'd be interested in seeing it with some more token leeway as well as comparing two or more similar prompts. like using "current time" instead of "${time}" and being more prescriptive about including numbers
collimarco
yesterday at 7:25 PM
In any case those clocks are all extremely inaccurate, even if AI could build a decent UI (which is not the case).
Some months ago I published this site for fun: https://timeutc.com There's a lot of code involved to make it precise to the ms, including adjusting based on network delay, frame refresh rate instead of using setTimeout and much more. If you are curious take a look at the source code.
3oil3
today at 5:47 AM
I wonder which model will silently be updated and suddenly start drawing clocks with Audemars-Piguet-level kind of complications.
amelius
yesterday at 8:35 PM
Maybe they can ask Sora to make variations of:
https://slate.com/human-interest/2016/07/martin-baas-giant-r...
shahzaibmushtaq
today at 6:13 AM
Interesting idea!
Why is a new clock being rendered every minute? Or AI models are evolving and improving every minute.
bwhiting2356
today at 4:10 AM
You should render it, show an image to the model and allow it to iterate. No person has to one-shot code without seeing what it looks like.
rtcode_io
yesterday at 9:41 PM
See https://clock.rt.ht/::code
AI-optimized <analog-clock>!
People expect perfection on first attempt. This took a brief joint session:
HI: define the custom element API design (attribute/property behavior) and the CSS parts
AI: draw the rest of the f… owl
baidoct
today at 11:45 AM
GPT-5 looks broken
wewtyflakes
today at 4:33 AM
It is funny to see the performance improve across many of the models, somewhat miraculously, throughout the day today.
nasir
yesterday at 8:53 PM
where's opus/sonnet! very curious on that!
syx
yesterday at 6:57 PM
I’m very curious about the monthly bill for such a creative project, surely some of these are pre rendered?
josfredo
today at 4:54 AM
Watching these gives me a strong feeling of unease. Art-wise, it is a very beautiful project.
whimsicalism
yesterday at 8:39 PM
Kimi K2 is obviously the best, but gpt-5 has the most gorgeous ones when it works
orly01
yesterday at 8:41 PM
What does it mean that each model is allowed 2000 tokens to generate its clock?
bigbluedots
today at 12:45 AM
I just realized I'm running late, it's almost -2!
More seriously, I'd love to see how the models perform the same task with a larger token allowance.
kfarr
yesterday at 6:50 PM
Add some voting and you got yourself an AI World Clock arena! https://artificialanalysis.ai/image/arena
hansmayer
yesterday at 9:18 PM
Very funny. It seems the Qwen generates the funniest outputs :)
fschuett
yesterday at 7:22 PM
Reminds me of this: https://www.youtube.com/watch?v=OGbhJjXl9Rk
aavshr
yesterday at 8:14 PM
just curious, why not the sonnet models? In my personal experience, Anthropic's Sonnet models are the best when it comes to things like this!
xyproto
yesterday at 8:17 PM
Try adding to the prompt that it has a PhD in Computer Science and have many methods for dealing with complexity.
This gives better results, at least for me.
yesterday at 7:58 PM
maxdo
yesterday at 10:44 PM
Selection of western models is weird no gpt-5.1 , opus 4.1 ( nailed it perfectly ) Something I quickly tested
warpspin
today at 12:55 PM
Lol. This is supposed to replace me at my job already?
Great experiment!
bongodongobob
yesterday at 8:28 PM
Weird. Sonnet 4.5 one shotted it with:
Create an interactive artifact of an analog clock face that keeps time properly.
https://claude.ai/public/artifacts/75daae76-3621-4c47-a684-d...
stym06
today at 4:45 AM
If a human had done this, these would be at a museum
yesterday at 8:05 PM
yesterday at 8:39 PM
esotericwarfare
today at 12:33 AM
This is an AD for Kimi K2
__fst__
yesterday at 10:27 PM
This is why we need TeraWatt DCs, to generate code for world clocks every minute.
HarHarVeryFunny
yesterday at 10:56 PM
Looks like we've got a new Turing test here: "draw me a clock"
ada1981
yesterday at 11:29 PM
Sonnet 4.5 did this easily https://claude.ai/public/artifacts/c1bb5d57-573b-49e0-9539-7...
bigbluedots
today at 12:50 AM
Is there a "draw a pelican riding a bicycle" version?
zkmon
yesterday at 7:24 PM
Was Claude banned from this Olympics?
abathologist
yesterday at 7:12 PM
This is great. If you think that the phenomena of human-like text generation evinces human-like intelligence, then this should be taken to evince that the systems likely have dementia. https://en.wikipedia.org/wiki/Montreal_Cognitive_Assessment
accrual
yesterday at 11:07 PM
I love that GPT-5 is putting the clock hands way outside the frame and just generally is a mess. Maybe we'll look back on these mistakes just like watching kids grow up and fumble basic tasks. Humorous in its own unique way.
Imanari
yesterday at 9:22 PM
Qwens clocks are hilarious
Waterluvian
yesterday at 8:44 PM
How do they do time without JavaScript? Is there an API I’m not aware of?
busymom0
yesterday at 7:12 PM
Because a new clock is generated every minute, looks like simply changing the time by a digit causes the result to be significantly different from the previous iteration.
kwanbix
yesterday at 9:06 PM
What a waste of energy.
0xCE0
yesterday at 9:44 PM
Seems like Will's clock drawing test in Hannibal :)
woopwoop
today at 4:52 AM
The qwen clocks are art.
ssl-3
yesterday at 8:51 PM
This really needs to be an xscreensaver hack.
JamesAdir
today at 8:11 AM
I believe that in a day or two, the companies will address this and it would be solved by them for that use case
gloosx
yesterday at 9:35 PM
anyone tried opening this from mobile? not a single clock renders correctly, almost looks like a joke on LLMs
jcmontx
yesterday at 8:42 PM
Grok is impressive, I should give it a shot
AlfredBarnes
yesterday at 7:05 PM
Its cool to see them get it right .....sometimes
miohtama
today at 12:36 AM
The new Turing time test
yesterday at 8:48 PM
hollow-moe
yesterday at 9:57 PM
obviously they're all broken on firefox, no one uses firefox anyways
mstipetic
yesterday at 7:26 PM
GPT-5 is embarrassing itself. Kimi and DeepSeek are very consistently good. Wild that you can just download these models.
bananatron
yesterday at 7:02 PM
grok's looks like one of those clocks you'd find at a novelty shop
shubham_zingle
yesterday at 7:27 PM
not sure about the accuracy though, although shooting in the dark
lxe
yesterday at 7:17 PM
Honestly, I think if you track the performance of each over time, since these get regenerated once in a while, you can then have a very, very useful and cohesive benchmark.
cyberjill
today at 3:12 AM
666
larodi
yesterday at 7:02 PM
would be gr8t to also see the prompt this was done with
imchillyb
today at 2:12 AM
I love qwen, it tries so hard with its little paddle and never gets anywhere.
1yvino
yesterday at 7:18 PM
i wonder kwen prompt woud look like hallucination?
bitwize
yesterday at 11:01 PM
I'm reminded of the "draw a clock" test neurologists use to screen for dementia and brain damage.
teaearlgraycold
yesterday at 10:28 PM
Qwen 2.5 doing a surprisingly good job (as of right now).
DeathArrow
yesterday at 9:43 PM
How can Deepseek and Kimi get it right while Haiku, Gemini and GPT are making a mess?
eastbound
yesterday at 8:24 PM
Security-wise, this is a website that takes the straight output of AI and serves it for execution on their website.
I know, developers do the same, but at least they check it in Git to notice their mistakes. Here is an opportunity for AI to call a Google Authentication on you, or anything else.
bpt3
yesterday at 8:21 PM
It's wild how much the output varies for the same model for each run.
I'm not sure if this was the intent or not, but it sure highlights how unreliable LLMs are.
novemp
yesterday at 7:54 PM
Oh cool, it's the schizophrenia clock-drawing test but for AI.
system2
yesterday at 7:40 PM
Ask Claude or ChatGPT to write it in Python, and you will see what they are capable of. HTML + CSS has never been the strong suit of any of these models.
shevy-java
yesterday at 7:16 PM
Now that is actually creative.
Granted, it is not a clock - but it could be art. It looks like a Picasso. When he was drunk. And took some LSD.
jonplackett
yesterday at 7:12 PM
kimi is kicking ass
fnord77
today at 2:05 AM
whatever model Cursor uses was telling me the date was March 12, 2023
surfingdino
today at 9:08 AM
What a wonderfully visual example of the crap LLMs turn everything into. I am eagerly awaiting the collapse of the LLM bubble. JetBrains added this crap to their otherwise fine series of IDEs and now I have to keep removing randomly inserted import statements and keep fixing hallucinated names of functions suggested instead of the names of functions that I have already defined in the same file. Lack of determinism where we expect it (most of the things we do, tbh) is creating more problems than it is solving.
jsmo
today at 6:05 AM
lol
10/04/2025
Gormanu
yesterday at 7:02 PM
[dead]
superlukas99
today at 3:28 AM
[dead]
PeterStuer
yesterday at 6:59 PM
Why? This is diagonal to how LLM's work, and trivially solved by a minimal hybrid front/sub system.
awkwam
yesterday at 7:37 PM
Limiting the model to only use 2000 tokens while also asking it to output ONLY HTML/CSS is just stupid. It's like asking a programmer to perform the same task while removing half their brain and also forget about their programming experience. This is a stupid and meaningless benchmark.
kburman
yesterday at 7:17 PM
These types of tests are fundamentally flawed. I was able to create perfect clock using gemini 2.5 pro - https://gemini.google.com/share/136f07a0fa78

AI World Clocks

lanewinfield

jdietrich

overfeed

danw1979

amelius

allarm

ghurtado

tablatom

lordnacho

Kiro

Kiboneu

david-gpu

DuperPower

travisjungroth

xrisk

jorgesborges

ACCount37

kaffekaka

ACCount37

TheJoeMan

bspammer

abixb

addandsubtract

danw1979

nightpool

susu1111

smusamashah

charliewallace

AnonHP

chemotaxis

anigbrowl

jdiff

nemomarx

ks2048

brianjking

csours

Fabricio20

samtheprogram

malfist

hakcermani

ceroxylon

otterley

wowczarek

frizlab

OJFord

frankfrank13

nightpool

bArray

bigfishrunning

skeeter2020

int_19h

minikomi

skeeter2020

susu1111

davidsainez

raddan

skeeter2020

WJW

leptons

jahewson

rrr_oh_man

jrflowers

scrollop

carterschonwald

DrewADesign

skeeter2020

manmal

tomrod

tamimio

BoorishBears

lanstin

Dilettante_

vohk

Dilettante_

jahewson

skeeter2020

andix

ceroxylon

energy123