Tiled Hacker news on React Router

The last six months in LLMs in five minutes

130 points - today at 1:30 AM

Source

LZ_Khan
today at 5:10 AM
I'm curious how the 6 months have looked from a non-programmer's perspective. What kind of co-working tools and similar optimizations have people from other fields experienced?
tptacek
today at 4:49 AM
If you're a vulnerability researcher or a security person generally, there's a big inflection point from Spring of this year.
Insanity
today at 3:37 AM
I wonder how much the 'inflection point' is a thing vs marketing. I'm sure the models got somewhat better, but even now when I'm trying to 'vibe code' a game with the latest models (combination of Codex w/ gpt5.5 and gpt5.3-codex), they really do struggle.
They definitely get something barebones up and running, but it's far from a fully fledged application.
shepherdjerred
today at 3:28 AM
> and there’s zero chance any AI lab would train a model for such a ridiculous task.
I'm not sure that's true anymore considering how popular Simon's blog is
grey-area
today at 4:51 AM
Haven’t noticed much significant progress in LLMs myself in 6 months (significant as in new or vastly improved capabilities or understanding, not new releases, there are plenty of those).
I feel like if anything people started to realise the significant limitations of LLMs when you try to use them as ‘agents’ which was the big direction LLM companies tried to push recently.
Best use of LLMs so far IMO is finding vulnerabilities (with human help) and pattern matching in other domains. For generating code and prose they are still mediocre and somewhat unreliable and for use as personal assistant agents I wouldn’t trust them.
So what’s happening with openclaw, the biggest experiment in agentic, vibe coded by the agents themselves? The thing that was so hot a few months ago.
https://github.com/openclaw/openclaw/pulse?period=daily
279 commits to main from 77 authors in the last 24 hours.
Why is there so much churn and how could you trust it with your data? This is changes in ONE day!
If these are useful changes, surely it’d be superhuman by now given months of this pace.
What are people using this for?
throwaway2027
today at 3:30 AM
December 2025 was the breakthrough for me. January Claude was euphoric, ChatGPT was up there. February Gemini cooked for a second there. March amazing. April the big bad nerf. May GPT 5.5 is just pure bliss altough 2x limits temporarily, not sure about Claude it's sort of okay still not as good as it felt before, slowly increasing limits with more compute and rebuilding good will.
zarzavat
today at 3:16 AM
Somewhere right now some human artist is being tasked with drawing illustrations of pelicans riding bicycles to be used as training data at a big AI lab.
dnnddidiej
today at 5:08 AM
Also LinkedIn wars of people trying to claim throne as most AI-pilled, throwing down strawmen stories of luddites yelling at data centres who'll lose their job to a single person doing 100x work.
vishal_new
today at 4:58 AM
what are your thoughts on Software engineer replacement. My team has already seen big reductions. Q/A team is gone. Software Engineer reduced by a third. Scared for the future
rTX5CMRXIfFG
today at 3:53 AM
Am I crazy, or are these differences between the best models so marginal that you’d get roughly the same performance if you use the same high-quality harness (ie preloaded instructions from md files, including custom skills)?
ex-aws-dude
today at 4:30 AM
Is the RLVR the key breakthrough for the uplift or is there more to it?
Does that suggest the uplift was only for things that are easily verifiable like code?
DeathArrow
today at 4:27 AM
Apart from GLM 5.1 and Qwen 3.6, there are other Chinese models that are noteworthy: Kimi K2.6, Xiaomi MiMo V2.5 Pro, Deepseek v4 and MiniMax M2.7.
DeathArrow
today at 4:44 AM
I think that there's a lot to be improved in harnesses and the way the models are interacting with harnesses. For example, the harness should be able to steer the model when thinking.
bluegatty
today at 4:04 AM
'Producing Images' or even 'Some Code that is Valid and Compiles' is in some ways one of the most misleading ways we assess quality of the AI.
It is getting very good at producing code that compiles - at the algorithmic level.
This is definitely noteworthy - and the AI is crossing a critical 'productivity threshold'.
But 'Drawing of a Proper Duck' is almost arbitrary because it may have nothing to do with the 'Specific Duck You Wanted'.
Everyone has tried to get AI to 'Draw The Thing They Want' and you notice immediately how it's almost impossible to 'adjust the image' along the vector you want - because ... and this is key:
-> the AI doesn't really understand what a Duck is, it's components, or fully how it made the duck <-
It just knows how to 'incant' the duck.
This becomes very clear when you try to get the AI to write proper documentation - it fails so miserably, even with direct guidance.
This is really strong evidence of how poorly the AI is generalizing, and that it is not 'understanding' rather it's 'synthesizing' from patterns.
We already kind of knew that - but we have not yet built an intuition for that until now.
Only now can we see 'how amazing the pattern synthesis' is - it's almost magic, and yet how it falls off a cliff otherwise
This has deep implications for the 'road ahead' and the kinds of things we're going to be able to do with AI.
In short: the AI is 'Wizard Level Code Helper, Researcher, and Worker' - but it very clearly lacks capabilities even one level of abstraction above the code itself.
LLMs were first trained by 'text' and now ... they are 'trained by our compilers'. Basically g++, javac, tsc are the 'Verifiable Human Rewards' in the post-training and reinforcement learning - and the AI is getting extremely good at producing 'code that compiles', but that's definitely an indirection from 'code that does what we want'.
It's astonishing that it took us all this time to internalize and start to discover what I think will be in hindsight a very obvious 'threshold' of it's capabilities.
We are constantly 'amazed' at the work that it can do, and therefore over-project it's capabilities.
I have no doubt that even with these limitations - the AI will unlock a lot more as it gets better - and - that it will 'creep up' the layers of abstraction of it's understanding.
But I strongly believe that the AI is going to get much 'wider' (pattern matching dominance) before it gets 'higher' (intrinsic understanding) - and - that this may be a fundamental limitation.
This may be 'the Le Cunn' insight - when he talks about the limitations of LLMs in detail - I believe this is that insight writ large.
Even the term AI - or certainly 'AGI' may be a misleading metaphor - were we to have always called it 'Stochastic Algorithms' or something along those lines, it's possible that our intuition would be framed a bit better.
The most interesting thing is how it is definitely amazing, world changing, novel and powerful and some ways - and obviously useless in others at the same time. That's the 'threshold' we need to better understand.
bb88
today at 3:14 AM
I met Simon for the first time this year at pycon. Wow, what a great guy.
tayo42
today at 4:31 AM
The claw thing really came and went fast lol
aizk
today at 3:31 AM
I'm so glad Simon is documenting this. The field is evolving so fast, so rapidly, so hungry for data and money, that few are willing to zoom out and document everything big picture so we can see the changes over time. I mean do you guys remember "Do anything now"? Just a distant memory, a funny party trick.
hmaddipatla
today at 1:34 AM
[dead]
iekekke
today at 3:06 AM
It’s good to see dates being hard coded re. Improvements in the models that should deliver material gains.
As time progresses one now has a yard stick to measure against progress. No more excuses - show me the money baby.

The last six months in LLMs in five minutes

LZ_Khan

tptacek

Insanity

kvakkefly

viccis

rafaelmn

musebox35

aspenmartin

bsder

szundi

bluegatty

minimaxir

halflife

adgjlsfhk1

DeathArrow

xbmcuser

DeathArrow

shepherdjerred

_puk

kzrdude

simonw

nickvec

grey-area

delichon

throwaway2027

dmpk2k

_puk

zarzavat

energy123

minimaxir

jofzar

dnnddidiej

vishal_new

ShinyLeftPad

rTX5CMRXIfFG

bluegatty

dnnddidiej

nl

minimaxir

raincole

Sparkyte

ex-aws-dude

rdedev

4b11b4

DeathArrow

simonw

rahimnathwani

DeathArrow

bluegatty

nl

bluegatty

bb88

tayo42

yieldcrv

aizk

hmaddipatla

iekekke