Tiled Hacker news on React Router

I put a datacenter GPU in my gaming PC

163 points - today at 1:53 PM

Source

sonzohan
today at 5:33 PM
I also recently decided to buy a datacenter GPU and slap it into a system. Some notes from my experience that the author doesn't mention in their article:
Decommissioned NVIDIA V100s and AMD MI50s are fairly cheap, $200 for 16gb and $400-500 for 32gb, for local experimentation. They are also very old. There's an enthusiast community keeping these two cards alive and working with current platforms and models.
Nitpick, but the V100 doesn't support bfloat16. The performance hit is not a big deal if you're fiddling with local models, but the card is on it's way out in terms of hardware features.
The MI50 does support bf16, but not the current edition of AMD ROCm. Vulkan support is good and the MI50 works with most major platforms (llama.cpp, vllm, etc.), but it's not without some pain points like manual recompilation. Fortunately the open source community has already paid most of your way.
The cooling requirements for these cards cannot be understated. A consumer grade GPU may throttle if in a small case without additional fans, but if given the same treatment a datacenter GPU will overheat itself idling. You will need to buy, at least, a bunch of decent 120mm fans to prevent this or invest in some water cooling.
I ultimately went with an AMD MI100 32GB ($950). I'm an AMD fan, current ROCm editions support it, and it was low-fuss to get things working. I'm debating getting a second so I can try out bigger models like qwen3-coder-next.
mettamage
today at 5:57 PM
> The way it works is that a vision encoder (similar to what ChatGPT and Claude use) takes image pixels and translates them into the LLM’s token embedding space. The model does not “see” the image the way a human does. Instead, the vision encoder compresses the image into a sequence of vectors that live in the same mathematical space as text tokens. The LLM then processes those vectors as if they were just another sequence of tokens.
Could you also do this for music and specifically sound synthesis? It would be awesome to vibe synthesize sounds and then see the VSTi parameters surrounding it.
Teknomadix
today at 2:39 PM
Tesla V100 SXM2 16GB is NOT DGX class as the author writes. It's HGX class. The V100 comes in two classes, SXM2 and SXM4, the latter coming with a Max of 80gb on board memory. Typically these are installed 8×A100 80GB SXM4 on an HGX riser, and what that gives you is NVSwitch fabric and 640GB of pooled HBM2e (on package stacked memory /w ~2 TB/s of memory bandwidth). 2u standard rack footprint too.
mickeyp
today at 2:44 PM
Impressive work. But the problem is not the 30 tok/s which is fine for agentic coding and chat.
It's prefill; slow prefill kills agentic workloads dead.
If you have 100,000 tokens at ~150tok/s per the OP, you're looking at:
```
    You have: 100000 / (150/s)

    You want: hms

     11 min + 6.6666667 sec
```
Which is quite a wait indeed.
jonhohle
today at 4:46 PM
I was just looking into this and was worried about the fan setup. Interesting that he was able to solve it with good results.
In case anyone is interested, I’m using PCIE passthrough on a FreeBSD host to a Linux guest with an older Pascal card. It’s worked great and I’ve been thinking about putting a nicer card in there. The SXM route seems great, but I’ve been burned (almost literally because of the heat) by DC components before.
bob1029
today at 3:01 PM
> And yes, if you want the absolute best, Opus 4.8 exists. It also costs more per 20 minutes of heavy use than I paid for this entire GPU and adapter setup combined. But the gap is shockingly small.
I don't think this is a fair characterization of the situation. I use frontier models via API pre-paid tokens every single day, and I can barely rack up $100 per month. The fact that we figured out how to burn double this in 20 minutes is impressive, but I don't think it reflects the reality that many are experiencing right now. There are some exceptionally gluttonous approaches to harnessing LLMs that I think are serving as convenient straw men in these discussions.
Paying for the API will almost always be more economical than self-hosting equivalent infrastructure. I am not against self-hosting, but the article suggests a primarily economic motivation for this effort. If you are consuming fewer than 10^9 tokens per month, I really don't think it's worth your time to try and compete with the hyperscalars. Most of the money is to be found in the integration of this technology with existing businesses.
matja
today at 2:26 PM
The AMD MI250X GPUs are also interesting - 128GB of HBM2E at 3TB/s, sometimes you see them second-hand for under $1k, the catch obviously is that it needs an OAM socket. Never seen an easy way to hook them up to a regular mainboard.
mondainx
today at 2:28 PM
Great write-up, I've often considered these DC cards for a project and now you've convinced me to pick one up; you describe the price of the unit against what one spends on tokens and that does it for me.
segmondy
today at 4:16 PM
The most interesting and perhaps useful for most would be how they control the fan. If you are thinking of doing this, you really want to get those fans under control, they are loud. For anyone thinking of these, v100s idle super high! 25-35watt with nothing loaded and easily 50w when a model is loaded.
lucamark
today at 2:22 PM
Congrats! Most people won’t want to debug drivers, kernels, ACPI, adapters, and fan headers. But for those who do, the capability-per-pound is absurd.
omarqureshi
today at 2:40 PM
Could probably avoid the crazy fan with a waterblock - I've seen a whole kit, v100 + PCIE adapter + block for £235. Yes, you'll have to pay for pump, radiators and radiator fans, but that should really quieten it down
abejfehr
today at 3:17 PM
Based on the title I was really hoping to see how this was used for gaming, but they just ran an LLM on it
00dazzle
today at 4:21 PM
That's the same price per VRAM GB as an arc pro B70
whoamii
today at 2:58 PM
The real question: did your local LLM write this post?
ewy1
today at 3:08 PM
despite gaming being used in the title, it is not mentioned in the article, but i'm curious how this performs.
i've ran some multi vendor frankenstein setups before and sometimes it even works, so i'm curious to hear your experience with it.
jmyeet
today at 2:27 PM
Some context:
- In 2017, the v100 was a ~$10,000 GPU. I believe there was a PCI-e version but this is probably so cheap because SXM2 is going to be harder to use;
- A 5090 has 1800GB/s of internal memory bandwidth (compared to 900GB/s in the 9 year old GPU). Of course a 5090 is substantially more expensive;
- A 5090 has ~21k CUDA cores vs ~5k;
- The current $10k NVidia GPU is the RTX 6000 Pro w/ 96GB of VRAM. It has slightly more CUDA cores but it otherwise pretty much just a 5090. This is unsurprising. NVidia uses VRAM for market segmentation.
Consider this: in 5-10 years, the trillions spent on AI data centers will likewise be sold for scrap most likely. That's how short the runway is for OpenAI and Anthropic to recover that investment.
Anyway, I'm kind of impressed the author managed to get this all to work. I don't think it even would've occurred to me that someone had made an SXM2 adapter, particularly because it's not even used anymore. Like props to whoever did that.
KnuthIsGod
today at 3:25 PM
AI written posts will kill HN.
pogue
today at 3:17 PM
But could you game with the GPU? Or is that purely a drivers issue?
axpy906
today at 3:26 PM
Wow. V100. That brings back memories. Way to go.
viseyth
today at 3:56 PM
Volta (and Pascal, which I'm using) should still be supported with driver 580 as long as you don't use the open modules, and you can use up to cuda 12.9 and cudnn 9.10.2. No need to limit yourself to an old kernel.
gtirloni
today at 3:27 PM
> The compute is still real. The VRAM is still real. And the memory bandwidth is where it gets genuinely surprising.
sigh
recursivegirth
today at 2:42 PM
> The compute is still real. The VRAM is still real. And the memory bandwidth is where it gets genuinely surprising.
Had to stop there. Annoying. I can't stand AI use for writing. It makes any otherwise great article feel so disingenuous.
today at 2:41 PM
wg0
today at 3:18 PM
Wait a few years, everyone will be able to put one at half the price.
casey2
today at 2:30 PM
Some resell group is going to have to make this easier. The shear amount of these cards otherwise heading towards the landfill is staggering. That is if Big Tech don't destroy them to prevent model weights from leaking.
hypfer
today at 2:42 PM
[dead]
lelanthran
today at 2:23 PM
> The compute is still real. The VRAM is still real. And the memory bandwidth is where it gets genuinely surprising.
Because humans write exactly like this /s
knollimar
today at 2:27 PM
A little bit of local copium but neat read.
Isn't a rasbpi with 16gb of RAM $300 now?

I put a datacenter GPU in my gaming PC

sonzohan

mettamage

Teknomadix

legitronics

mickeyp

HarHarVeryFunny

Aurornis

Tepix

pastage

jonhohle

bob1029

vidarh

iJohnDoe

vidarh

oceanplexian

MattRix

krzyk

foolfoolz

matja

Gracana

Teknomadix

plagiarist

selectively

mondainx

tymscar

segmondy

lucamark

omarqureshi

pogue

omarqureshi

omarqureshi

abejfehr

darkwater

hakfoo

toast0

lightedman

yjftsjthsd-h

axpy906

mschuster91

00dazzle

tymscar

whoamii

20wenty

tymscar

iugtmkbdfil834

unshavedyak

iugtmkbdfil834

yjftsjthsd-h

gsquaredxc

lukeschlather

xp84

ewy1

jmyeet

echelon

wholinator2

mschuster91

mschuster91

b112

KnuthIsGod

tymscar

pogue

mcraiha

axpy906

viseyth

markus92

gtirloni

recursivegirth

tymscar

m0rde

peddling-brink

m0rde

gsquaredxc

qingcharles

rafram

qingcharles

fouc

tymscar

SubiculumCode

wg0