Tiled Hacker news on React Router

NanoGPT Slowrun: Language Modeling with Limited Data, Infinite Compute

94 points - today at 5:56 PM

Source

linolevan
today at 10:23 PM
There was this very interesting paper out of Stanford this last September about pretraining under the unlimited compute but limited data paradigm[0]. Pretty much exactly the same thing but with ~200M training tokens instead.
[0] https://www.alphaxiv.org/abs/2509.14786
kseniamorph
today at 9:08 PM
Curious about the baseline choice. modded-nanogpt was optimized for wall-clock speed, not data efficiency, so it seems like an unusual reference point for this kind of benchmark. Why not vanilla NanoGPT?
archermarks
today at 7:23 PM
Very cool idea. Interested to see how this progresses. One question: how worried are you about over-training on this particular dataset? i.e. instead of generalizing you lean more toward memorization? Obviously you leave out a validation set but since you're meta-optimizing the model itself by its performance on the validation dataset you're still at risk of over-fitting.
lzaborowski
today at 7:52 PM
I like the idea of flipping the constraint. Most ML benchmarks assume unlimited data and limited compute, so people optimize for speed.
If high-quality training data becomes the real bottleneck, then the interesting question is how much signal you can extract from the same dataset when compute is cheap.
suddenlybananas
today at 6:43 PM
Reminds me a fair bit of the BabyLM challenge. It would be good to give them a shout-out and see how this challenge differs.
navvyeanand
today at 7:59 PM
Amazing job!
riajain2525
today at 8:34 PM
Super cool!

NanoGPT Slowrun: Language Modeling with Limited Data, Infinite Compute

linolevan

sdpmas

kseniamorph

timshel1

sdpmas

archermarks

sdpmas

lzaborowski

suddenlybananas

sdpmas

soraki_soladead

navvyeanand

riajain2525