Tiled Hacker news on React Router

NanoGPT Slowrun: 10x Data Efficiency with Infinite Compute

45 points - today at 6:51 PM

nsnzjznzbx
today at 8:59 PM
We will get to the point where you can quickly bootstrap i.e. an LLM can train a better LLM in a loop, leave it and it can really learn. Like learn learn.
"Train yourself to solve this problem see OBJECTIVE.md"
littlestymaar
today at 7:38 PM
> Data efficiency matters because compute grows much faster than data [2] (referencing a paper from 2022)
I'm not convinced this is particularly true in today's world, if you have more compute, you can simply generate more, and higher quality, artificial data. That's what all labs have been doing since at least 2023.
Also, the post references the Chinchilla-optimal training as a comparison baseline, but everyone has moved far beyond Chinchilla scaling, small models are routinely trained on 10-400 times more data than (1-40T tokens) than the Chinchilla-optimal number, so the entire industry went the complete opposite of what they are proposing.
That doesn't mean the techniques presented here are useless or anything (I'm not qualified to judge) but you should take the introduction with a grain of salt.
yorwba
today at 7:30 PM
Related: Discussion on the initial NanoGPT Slowrun announcement: https://news.ycombinator.com/item?id=47251259 (185 points 15 days ago, 39 comments)
AliEveryHour16
today at 9:04 PM
[dead]