The writing is laughably bad. I can’t tell if it’s someone that over relied on AI or if they just mimic the structure and mannerisms of AI produced writing because that’s what they see.
A few choice examples:
> Checkout part one of this series for an intro to HipKittens and checkout this post for a technical deep dive.
> Unsurprisingly, making AMD GPUs go brr boils down to keeping the “matrix cores” (tensor cores on NVIDIA) fed.
> These two patterns tradeoff programmability and performance, where 8-wave and its large tile primitives lead to compact code and 4-wave fine-grained interleaving expands code size. Surprisingly, the 8-wave schedule is sufficient to achieve SoTA-level performance on GEMMs and attention forwards. For GQA non-causal attention backwards, 8-wave also outperforms all AMD baselines by
1.8
Ă—
1.8Ă—, and our HK 4-wave further outperforms by
2.3
Ă—
2.3Ă—.
And I could go on. And on.
But overall besides the overuse of cliche/memespeak places it doesn’t make sense, the entire section that deals with the hot loop describes something that should be explained in a graph and instead explained in 100 lines of source code.
beepbooptheory
today at 1:32 PM
Am I crazy what is wrong with any of those quotes.