Tiled Hacker news on React Router

Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster

79 points - today at 4:55 PM

Source

kraddypatties
today at 5:40 PM
I feel like most of this recent Autoresearch trend boils down to reinventing hyper-parameter tuning. Is the SOTA still Bayesian optimization when given a small cluster? It was ~3 years ago when I was doing this kind of work, haven't kept up since then.
Also, shoutout SkyPilot! It's been a huge help for going multi-cloud with our training and inference jobs (getting GPUs is still a nightmare...)!
pbkhrv
today at 8:16 PM
> How parallelism changed the agent’s research strategy > With a single GPU, the agent is stuck doing greedy hill-climbing: try one thing, check the result, pick a direction, try the next thing. With 16 GPUs, the strategy shifts. ...skip... 12 experiments in a single 5-minute wave. This makes it much harder to get stuck in local optima and much easier to find interaction effects between parameters.
The agent can theoretically come up with a protocol to run those same 12 experiments one-by-one and only then decide which branch to explore next - which I think would lead to the same outcome?
But in this case, it just happened to have stumbled on this particular outcome only because it didn't get a chance to execute a greedy strategy after the first 1 or 2 results.
Worse experiment design + parallelism = better experiment design + serialized execution ?
zhwu
today at 6:00 PM
The most surprising part: the agent had access to both H100s and H200s. Without being told, it noticed H200s scored better and started screening ideas on H100s, then promoting winners to H200s for validation. That strategy emerged entirely on its own.
fabmilo
today at 7:01 PM
I am fascinated by this example of using AI to improve AI. I won a small prize using this technique on helion kernels at a pytorch hackathon in SF.
The next step are: - give the agent the whole deep learning literature research and do tree search over the various ideas that have been proposed in the past. - have some distributed notepad that any of these agents can read and improve upon.
covi
today at 6:00 PM
This feels like the chimpanzee with a power drill. An agent is honestly just brute-force search, but guided.
ipsum2
today at 6:25 PM
A cluster is 2 nodes? That's technically true, but not very exciting.
saberience
today at 7:49 PM
Wait, "Karpathy's Autoresearch", you mean a loop that prompts the agent to improve a thing given a benchmark?
People have been doing this for a year or more, Ralph loops etc.
I hate the weird strange Twitter world of hero-worship for folks that seems to arise just out of large followings.
Joe no-followers does this six months ago, nobody cares. Karpathy writes a really basic loop and it's now a kind of AI miracle prompting tons of grifters, copy-cats, weird hype.
I do wonder if LLMs have just made everyone seriously, seriously dumber all of a sudden. Most of the "Autoresearch" posts I see are completely rubbish, with AI optimizing for nonsense benchmarks and people failing to understand the graphs they are looking at. So yes, the AI made itself better at a useless benchmark while also making the code worse in 10 other ways you don't actually understand.
maxothex
today at 8:12 PM
[dead]
opensre
today at 8:15 PM
[flagged]
pratelsingh
today at 6:17 PM
[dead]
mika-el
today at 8:33 PM
[flagged]

Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster

kraddypatties

karpathy

achierius

karpathy

kraddypatties

corndoge

westurner

saberience

nfg

_menelaus

ipsum2

kraddypatties

pbkhrv

zhwu

rogerrogerr

fdghrtbrt

rogerrogerr

fdghrtbrt

rogerrogerr

fdghrtbrt

caconym_

hhh

ed

Aboutplants

TheJord

fabmilo

covi

chaos_emergent

groby_b

gwern

ipsum2

saberience

password54321

maxothex

opensre

pratelsingh

mika-el