AutoKernel: Autoresearch for GPU Kernels
29 points - today at 7:42 AM
SourceI guess we will have a lot more benefits if we can get this to work on something like llama.cpp - since it really has a lot of kernels for different quantizations, a lot of home users, high hardware diversity - so it is a likely place with highest bang for the buck.
I guess they can be a contributor there.
NitpickLawyer
today at 8:31 AM
... and so it begins.
For a bit of context, goog already did something like this two generations of models ago, as announced in this blog post[1] from May '25:
> AlphaEvolve is accelerating AI performance and research velocity. By finding smarter ways to divide a large matrix multiplication operation into more manageable subproblems, it sped up this vital kernel in Geminiβs architecture by 23%, leading to a 1% reduction in Gemini's training time.
We are now seeing the same thing "at home", for any model. And with how RL heavy the new training runs have become, inference speedups will directly translate in faster training as well.
[1] - https://deepmind.google/blog/alphaevolve-a-gemini-powered-co...
Have you benchmarked this against autoscheduling like with TVMs Ansor?
zacklee1988
today at 9:12 AM
[dead]