Tiled Hacker news on React Router

Why are your models so big? (2023)

38 points - 12/02/2025

Source

unleaded
12/06/2025
Still relevant today. Many problems people throw onto LLMs can be done more efficiently with text completion than begging a model 20x the size (and probably more than 20x the cost) to produce the right structured output. https://www.reddit.com/r/LocalLLaMA/comments/1859qry/is_anyo...
lsb
12/06/2025
My threshold for “does not need to be smaller” is “can this run on a Raspberry Pi”. This is a helpful benchmark for maximum likely useful optimization.
A Pi has 4 cores and 16GB of memory these days, so, running Qwen3 4B on a pi is pretty comfortable: https://leebutterman.com/2025/11/01/prompt-optimization-on-a...
lynndotpy
12/06/2025
> I think the future will be full of much smaller models trained to do specific tasks.
This was the very recent past! Up until we got LLM-crazy in 2021, this was the primary thing that deep learning papers produced: New models meant to solve very specific tasks.
jgalt212
12/06/2025
The net $5.5T the fed printed had to go somewhere. AI Arms Race was the answer. And when the models got good, then we needed agentic to create unbounded demand for inference just as there was unbounded demand for training.
https://fred.stlouisfed.org/series/WALCL
siddboots
12/05/2025
I think I have almost the opposite intuition. The fact that attention models are capable of making sophisticated logical constructions within a recursive grammar, even for a simple DSL like SQL, is kind of surprising. I think it’s likely that this property does depend on training on a very large and more general corpus, and hence demands the full parameter space that we need for conversational writing.
brainless
12/06/2025
May I add Gliner to this? The original Python version and the Rust version. Fantastic (non LLM) models for entity extraction. There are many others.
I really think using small models for a lot of smell tasks is the best way forward but it's not easy to orchestrate.
K0IN
12/06/2025
Im always so surprised that embedding models we had for years like minlm (80mb) are so small, and I really wonder why not more on device searches use something like it.
semiinfinitely
12/06/2025
I don’t understand why today’s laptops are so large. Some of the smallest "ultrabooks" getting coverage sit at 13 inches, but even this seems pretty big to me.
If you need raw compute, I totally get it. Things like compiling the Linux kernel or training local models require a high level of thermal headroom, and the chassis has to dissipate heat in a manner that prevents throttling. In cases where you want the machine to act like a portable workstation, it makes sense that the form factor would need to be a little juiced up.
That said, computing is a whole lot more than just heavy development work. There are some domains that have a tightly-scoped set of inputs and require the user to interact in a very simple way. Something like responding to an email is a good example — typing "LGTM" requires a very small screen area, and it requires no physical keyboard or active cooling. checking the weather is similar: you don’t need 16 inches of screen real estate to go from wondering if it’s raining to seeing a cloud icon.
I say all this because portability is expensive. Not only is it expensive in terms of back pain — maintaining the ecosystem required to run these machines gets pretty complicated. You either end up shelling out money for specialized backpacks or fighting for outlet space at a coffee shop just to keep the thing running. In either case, you’re paying big money (and calorie) costs every time a user types remind me to eat a sandwich.
I think the future will be full of much smaller devices. Some hardware to build these already exists, and you can even fit them in your pocket. This mode of deployment is inspiring to me, and I’m optimistic about a future where 6.1 inches is all you need.
debo_
12/06/2025
2000: My spoon is too big
2023: My model is too big
musicandpiss
12/06/2025
thank you for sharing.-)

Why are your models so big? (2023)

unleaded

crystal_revenge

hippo22

_ea1k

lsb

lynndotpy

_ea1k

socketcluster

forgotTheLast

jgalt212

lioeters

siddboots

brainless

K0IN

semiinfinitely

hobs

bee_rider

tebruno99

Archelaos

debo_

musicandpiss