Tiled Hacker news on React Router

Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency

83 points - today at 4:18 PM

Source

satvikpendem
today at 5:32 PM
Unsloth's collection as well [0], with their results [1]. Looks like they can get very close to 100% accuracy compared to the BF16 model that is unquantized, and Unsloth's quants are better than the original Google's QAT as posted in the article.
Personal I'm using the 2B model for web search and structured JSON output back via Unsloth Studio and its API, works very well for that even with the model embedded on phones.
[0] https://huggingface.co/collections/unsloth/gemma-4-qat
[1] https://unsloth.ai/docs/models/gemma-4/qat#qat-analysis
minimaxir
today at 5:14 PM
It's a bit awkward to release Gemma 4 12B (https://news.ycombinator.com/item?id=48385906), and then a canonical Q4_0 Gemma 4 12B a couple days later.
It's good that this post lists the expected VRAM usage for the models with Q4_0 Gemma 4 12B being 6.7GB, which will indeed fit Google's claims of fitting within 16GB comfortably, altough it confirms that only the quantized version will do so.
Relatedly, in Google's newly released Edge Gallery for macOS, Gemma 4 12B is explicitly listed as unsupported due to not enough RAM even on a 16GB machine, but given the expected VRAM usage here the Q4_0 variant definitely should fit and Google should fix that.
somewhatrandom9
today at 6:09 PM
Could these quantized models make MTP (Multi-Token Prediction) faster when used in conjunction with larger Gemma 4 models?
cr3cr3
today at 6:05 PM
For a moment I got excited thinking QAT is Intel Quick Assist Technology...
netdur
today at 5:16 PM
had a good run with Gemma 4 E2B Unsloth 4Q: https://youtube.com/shorts/XLsAnz5aAAI
The E4B model doesn’t fit on my phone TPU, so it swaps to RAM, the QAT version means more accuracy, good!
refulgentis
today at 5:22 PM
@google.com'ers, there are no GGUFs (blog says there is)
comparedge
today at 6:21 PM
[flagged]

Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency

satvikpendem

llmoorator

slopinthebag

minimaxir

Aurornis

netdur

refulgentis

ddarolfi

refulgentis

satvikpendem

somewhatrandom9

cr3cr3

netdur

refulgentis

minimaxir

refulgentis

comparedge