Tiled Hacker news on React Router

Taking on CUDA with ROCm: 'One Step After Another'

181 points - yesterday at 10:38 PM

Source

lrvick
yesterday at 11:49 PM
Just spent the last week or so porting TheRock to stagex in an effort to get ROCm built with a native musl/mimalloc toolchain and get it deterministic for high security/privacy workloads that cannot trust binaries only built with a single compiler.
It has been a bit of a nightmare and had to package like 30+ deps and their heavily customized LLVM, but got the runtime to build this morning finally.
Things are looking bright for high security workloads on AMD hardware due to them working fully in the open however much of a mess it may be.
0xbadcafebee
today at 1:57 AM
AMD has years of catching up to do with ROCm just to get their devices to work well. They don't support all their own graphics cards that can do AI, and when it is supported, it's buggy. The AMDGPU graphics driver for Linux has had continued instability since 6.6. I don't understand why they can't hire better software engineers.
mstaoru
today at 6:52 AM
I'm team "taking on CUDA with OpenVINO" (and SYCL*), Intel seems really upped their game on iGPU and dGPU lately, with sane prices and fairly good software support and APIs.
I'm not talking gaming CUDA, but CV and data science workloads seem to scale well on Arc and work well on Edge on Core Ultra 2/3.
adev_
today at 9:30 AM
A little feedback to AMD executives about the current status of ROCm here:
(1) - Supporting only Server grade hardware and ignoring laptop/consumer grade GPU/APU for ROCm was a terrible strategical mistake.
A lot of developers experiments first and foremost on their personal laptop first and scale on expensive, professional grade hardware later. In addition: some developers simply do not have the money to buy server grade hardware.
By locking ROCm only to server grade GPUs, you restrict the potential list of contributors to your OSS ROCm ecosystem to few large AI users and few HPC centers... Meaning virtually nobody.
A much more sensible strategy would be to provide degraded performance for ROCm on top of consummer GPUs, and this is exactly what Nvidia with CUDA is doing.
This is changing but you need to send a clear message there. EVERY new released device should be properly supported by ROCm.
- (2) Supporting only the two last generations of architecture is not what customers want to see.
https://rocm.docs.amd.com/projects/install-on-linux/en/docs-...
People with existing GPU codebase invests significant amount of effort to support ROCm.
Saying them two years later: "Sorry you are out of update now!" when the ecosystem is still unstable is unacceptable.
CUDA excels to backward compatibility. The fact you ignore it entirely plays against you.
(3) - Focusing exclusively on Triton and making HIP a second class citizen is non-sensical.
AI might get all the buzz and the money right now, we go it.
It might look sensible on the surface to focus on Python-base, AI focused, tools like Triton and supporting them is definitively necessary.
But there is a tremendous amount of code that is relying on C++ and C to run over GPU (HPC, simulation, scientific, imaging, ....) and that will remain there for the multiple decades to come.
Ignoring that is loosing, again, custumers to CUDA.
It is currently pretty ironic to see such a move like that considering that AMD GPUs currently tend to be highly competitive over FP64, meaning good for these kind of applications. You are throwing away one of your own competitive advantage...
(4) - Last but not least: Please focus a bit on the packaging of your software solution.
There has been complained on this for the last 5 years and not much changed.
Working with distributions packagers and integrating with them does not cost much... This would currently give you a competitive advantage over Nvidia..
rdevilla
today at 1:27 AM
ROCm is not supported on some very common consumer GPUs, e.g. the RX 580. Vulkan backends work just fine.
jmward01
today at 4:07 AM
I really want to get to the point that I am looking online for a GPU and Nvidia isn't the requirement. I think we are really close to there. Maybe we are there and my level of trust just needs to bump up.
p1esk
today at 12:51 AM
Someone from AMD posted this a few minutes ago, then deleted it:
"Anush's success is due to opting out of internal bureaucracy than anything else. most Claude use at AMD goes through internal infrastructure that can take hundreds of seconds per response due to throttling. Anush got us an exemption to use Anthropic directly. he is also exempt from normal policies on open source and so I can directly contribute to projects to add AMD support. He's an effective leader and has turned ROCm into a internal startup based in California. Definitely worth joining the team even if you've heard bad things about AMD as a whole."
This kind of bullshit is why I don't want to join AMD, even if this particular team is temporarily exempt from it.
taherchhabra
today at 6:49 AM
Genuine question. After claude code, codex etc, can't this be speedup ?
bruce343434
today at 1:33 AM
In my experience fiddling with compute shaders a long time ago, cuda and rocm and opencv are way too much hassle to set up. Usually it takes a few hours to get the toolkits and SDK up and running that is, if you CAN get it up and running. The dependencies are way too big as well, cuda is 11gb??? Either way, just use Vulkan. Vulkan "just works" and doesn't lock you into Nvidia/amd.
suprjami
today at 4:35 AM
Just in time for Vulkan tg to be faster in almost all situations, and Vulkan pp to be faster in many situations with constant improvements on the way, making ROCm obsolete for inference.
roenxi
today at 1:02 AM
> Challenger AMD’s ability to take data center GPU share from market leader Nvidia will certainly depend on the success or failure of its AI software stack, ROCm.
I don't think this is true. ROCm is a huge advantage for Nvidia but as far as I can tell it is more a set of R&D libraries than anything else, so all the Hot New Stuff keeps being Nvidia first and only (to start with) as the library ecosystem for the hotness doesn't exist yet. Then eventually new libraries are created that are CUDA independent and AMD turns out to make pretty good graphics cards.
I wouldn't be surprised of ROCm withered on the vine and AMD still does fine.
hurricanepootis
today at 1:02 AM
I've been using ROCm on my Radeon RX 6800 and my Ryzen AI 7 350 systems. I've only used it for GPU-accelerated rendering in Cycles, but I am glad that AMD has an option that isn't OpenCL now.
pjmlp
today at 5:11 AM
They need lots of steps, hardware support, IDE and graphical debugging integrations , the polyglot ecosystem, having a common bytecode used by several compiler backends (CUDA is not only C++), the libraries portfolio.
superkuh
yesterday at 11:32 PM
AMD hasn't signaled in behavior or words that they're going to actually support ROCm on $specificdevice for more than 4-5 years after release. Sometimes it's as little as the high 3.x years for shrinks like the consumer AMD RX 580. And often the ROCm support for consumer devices isn't out until a year after release, further cutting into that window.
Meanwhile nvidia just dropped CUDA/driver support for 1xxx series cards from their most recent drivers this year.
For me ROCm's mayfly lifetime is a dealbreaker.
amelius
today at 8:38 AM
How long until we can use AI to simply translate all the CUDA stuff to another (more open) platform? I'm getting the feeling we're getting close.
AI won't be working in nVidia's advantage this time.
today at 12:38 AM
DeathArrow
today at 7:25 AM
Do we get better perf or tokens per second with AMD and its software stack than with Nvidia?
alecco
today at 12:05 AM
Apple got it right with unified memory with wide bus. That's why Mac Minis are flying for local models. But they are 10x less powerful in AI TOPS. And you can't upgrade the memory.
I really wish AMD and Intel boards get replaced by competent people. They could do it in very short time. Both have integrated GPUs with main memory. AMD and Intel have (or at least used to have) serious know-how in data buses and interconnects, respectively. But I don't see any of that happening.
ROCm? It can't even support decent Attention. It lacks a lot of features and NVIDIA is adding more each year. Soon they will reach escape velocity and nobody will catch them for a decade. smh
formerly_proven
today at 6:56 AM
We’ve been talking about this for a good ten years at least and AMD is still essentially in the “concepts of a plan” phase. The AMD GPGPU software org has to be one of the most inconsequential ones at this rate.
ycui1986
today at 1:11 AM
For many LLM load, it seems ROCm is slower than vulkan. What’s the point?
shmerl
yesterday at 11:43 PM
Side question, but why not advance something like Rust GPU instead as a general approach to GPU programming? https://github.com/Rust-GPU/rust-gpu/
From all the existing examples, it really looks the most interesting.
I.e. what I'm surprised about is lack of backing for it from someone like AMD. It doesn't have to immediately replace ROCm, but AMD would benefit from it advancing and replacing the likes of CUDA.
neuroelectron
today at 9:11 AM
Now that the AI bubble is starting to burst, it's a great time for AMD to reveal their AI ambitions. They've set the tone by hiring low cost, outsourced labor.
Of course everybody knows what's really going on here. It's not an open discussion, however.
techpulselab
today at 8:12 AM
[dead]
xkbear89
today at 4:55 AM
[flagged]
cameolkc
today at 2:25 AM
[dead]
emilyhudson
today at 5:24 AM
[dead]
blovescoffee
yesterday at 11:28 PM
Naive question, could agents help speed up building code for ROCm parity with CUDA? Outside of code, what are the bottlenecks for reaching parity?
nnevatie
today at 3:11 AM
Why is it called "ROCm” (with the strange capitalization) in the first place? This may sound silly, but in order to compete, every detail matters, including the name.

Taking on CUDA with ROCm: 'One Step After Another'

lrvick

WhyNotHugo

lrvick

jauntywundrkind

lrvick

MrDrMcCoy

jeroenhd

CamouflagedKiwi

hackernows_test

999900000999

jeroenhd

Asmod4n

lrvick

shaklee3

cyberax

pjmlp

cmxch

MrDrMcCoy

salawat

lrvick

0xbadcafebee

xethos

shakow

onlyrealcuzzo

oofbey

tux1968

throwawayrgb

pjc50

KeplerBoy

aurareturn

oofbey

Shitty-kitty

pjmlp

Shitty-kitty

pjmlp

cm2187

pjmlp

wlesieutre

pjmlp

wlesieutre

pjmlp

throwawayrgb

mstaoru

adev_

pjmlp

rdevilla

chao-

daemonologist

0xbadcafebee

lpcvoid

terribleperson

kombine

pjmlp

maxloh

BobbyTables2

hurricanepootis

daemonologist

jmward01

m-schuetz

pjmlp

p1esk

nl

0xbadcafebee

nl

noident

brcmthrowaway

taherchhabra

Gasp0de

bruce343434

Arech

cmovq

rdevilla

cylemons

NekkoDroid

suprjami

kimixa

m-schuetz

roenxi

hurricanepootis