Tiled Hacker news on React Router

How an inference provider can prove they're not serving a quantized model

58 points - today at 6:53 AM

Source

hleszek
today at 9:22 PM
Why not allow the user to provide the seed used for the generation. That way at least we can detect if the model has changed if the same prompt with the same seed suddenly gives a new answer (assuming they don't cache answers), you could compare different providers which supposedly use the same model, and if the model is open-weight you could even compare yourself on your own hardware or on rented gpus.
wongarsu
today at 8:40 PM
I'm somehow more convinced by the method shown in the introduction of the article: run a number of evals across model providers, see how they compare. This also catches all other configuration changes an inference provider can make, like KV-cache quantization. And it's easy to understand, talk about, and the threat model is fairly clear (be wary of fixed answers to your benchmark if you're really distrustful)
Of course conceptually attestation is neat and wastes less compute with repeated benchmarks. It definitely has its place
viraptor
today at 8:39 PM
The title here seems very different from the post. All that verification happens locally only. There's no remote validation at any point. So I'm not sure what's the reason to even apply this check. If you're running the model yourself, you know what you're downloading and can check the hash once for transfer problems. Then you can do different things for preventing storage bitrot. But you're not proving anything to your users this way.
You'd need to run a full, public system image with known attestation keys and return some kind of signed response with every request to do that. Which is not impossible, but the remote part seems to be completely missing from the description.
robrenaud
today at 10:36 PM
Please serve well quantized models.
If you can get 99 percent of the quality for 50 percent of the cost, that is most times a good tradeoff.
bthornbury
today at 9:23 PM
Is modelwrap running on arbitrary clients? I'm not following the whole post, but how are you able to maintain confidence in client-owned hardware/disks following the secure model the method seems to depdend on?
arcanemachiner
today at 8:22 PM
Call me an old fuddy-duddy, but my faith in the quality of your reporting really fell through the floor when I saw that the first image showed Spongebob Squarepants swearing at the worst-performing numbers.
EDUT: I read through the article, and it's a little over my head, but I'm intrigued. Does this actually work?
rhodey
today at 8:35 PM
In my opinion this is very well written
Two comments so far suggesting otherwise and I guess idk what their deal is
Attestation is taking off
LoganDark
today at 9:22 PM
I don't understand what stops an inference provider from giving you a hash of whatever they want. None of this proves that's what they're running, it only proves they know the correct answer. I can know the correct answer all I want, and then just do something different.
jMyles
today at 9:42 PM
Related but distinct: Is there an ELI5 about determinism in inference? In other words, when will the same prompt lead to the same output, and when not? And why not?
exceptione
today at 8:27 PM
The idea is that you run a workload at a model provider, that might cheat on you by altering the model they offer, right? So how does this help? If the provider wants to cheat (they apparently do), wouldn't they be able to swap the modelwrap container, or maybe even do some shenanigans with the filesystem?
I am ignorant about this ecosystem, so I might be missing something obvious.
45dsilicon
today at 7:56 PM
[dead]
cmrx64
today at 9:45 PM
https://hellas.ai is building out their category theoretic compiler and protocol for solving this issue

How an inference provider can prove they're not serving a quantized model

hleszek

bthornbury

whatsupdog

tripplyons

jashulma

measurablefunc

maxilevi

tripplyons

bthornbury

wongarsu

Aurornis

viraptor

FrasiertheLion

arboles

viraptor

arboles

viraptor

3s

FrasiertheLion

arboles

julesdrean

robrenaud

bthornbury

FrasiertheLion

arcanemachiner

rhodey

LoganDark

rhodey

LoganDark

FrasiertheLion

LoganDark

FrasiertheLion

LoganDark

julesdrean

LoganDark

jMyles

FrasiertheLion

measurablefunc

exceptione

FrasiertheLion

45dsilicon

cmrx64

tripplyons