Tiled Hacker news on React Router

Adaptive LLM routing under budget constraints

206 points - 09/01/2025

Source

pbd
09/01/2025
GPT-4 at $24.7 per million tokens vs Mixtral at $0.24 - that's a 100x cost difference! Even if routing gets it wrong 20% of the time, the economics still work. But the real question is how you measure 'performance' - user satisfaction doesn't always correlate with technical metrics.
QuadmasterXLII
09/01/2025
The framing in the headline is interesting. As far as I recall, spending 4x more compute on a model to improve performance by 7% is the move that has worked over and over again up to this point. 101 % of GPT-4 performance (potentially at any cost) is what I would expect an improved routing algorithm to achieve.
spoaceman7777
09/01/2025
Incredible that they are using contextual bandits, and named it: Preference-prior Informed Linucb fOr adaptive rouTing (PILOT)
Rather than the much more obvious: Preference-prior Informed Linucb For Adaptive Routing (PILFAR)
fny
09/01/2025
Is there a reason human preference data is even needed? Don't LLMs already have a strong enough notion of question complexity to build a dataset for routing?
hackathonguy
09/02/2025
I'm very curious whether a) anecdotally, anyone has encountered a real enterprise cost-cutting effort focused on LLM APIs and b) empirically, whether anyone has done any research on price elasticity in LLMs of different performance scales.
So far, my experience has been that it's just too early for most people / applications to worry about cost - at most, I've seen AI to be accountable for 10% of cloud costs. But very curious if others have other experiences.
lewtun
09/01/2025
> We instantiate this idea through Preference-prior Informed Linucb fOr adaptive rouTing (PILOT), a novel extension of LinUCB
Academics are pretty creative at naming their creations
CuriouslyC
09/01/2025
These router papers are popping up hard now. I have a gradient boosted router I've been playing with that ties into retrieval to provide adaptive routing. The truth about these routers is that you have to tune them on your workloads to get the full benefit, otherwise they test way better than they work in production. That was why I added the retrieval aspect to mine, otherwise your top line slice and reality are very different.
danieltanfh95
09/02/2025
Unless your application is relatively trivial you would always want consistent behaviour as much as possible than some random metric that is used to proxy as "performance", routing is NOT the solution.
axiom92
09/01/2025
From last neurips https://automix-llm.github.io/automix/
andrewflnr
09/01/2025
Is this really the frontier of LLM research? I guess we really aren't getting AGI any time soon, then. It makes me a little less worried about the future, honestly.
Edit: I never actually expected AGI from LLMs. That was snark. I just think it's notable that the fundamental gains in LLM performance seem to have dried up.
westurner
09/01/2025
Would there be advantages to routing to models according to cost in conjunction with prompt rewriting?
valentinammm
09/01/2025
[dead]

Adaptive LLM routing under budget constraints

pbd

FINDarkside

nutjob2

baq

aspect8445

worm00111

KETHERCORTEX

drittich

ivape

dcre

simpaticoder

monsieurbanana

simpaticoder

monsieurbanana

datadrivenangel

Keyframe

mkoubaa

pqtyw

KTibow

QuadmasterXLII

dang

spoaceman7777

bhickey

fny

delichon

fny

nutjob2

adtac

lillecarl

jibal

mhh__

ashirviskas

lillecarl

imtringued

hackathonguy

dahcryn

baq

lewtun

CuriouslyC

CuriouslyC

danieltanfh95

axiom92

andrewflnr

kenjackson

ACCount37

9dev

kenjackson

neuronexmachina

_heimdall

nutjob2

_heimdall

dahcryn

ashirviskas

abalashov

baq

jibal

andrewflnr

nicce

nutjob2

nicce

roywiggins

nicce

roywiggins

nicce

ctoth

andrewflnr

srekhi

yahoozoo

andrewflnr

muldvarp

ACCount37

muldvarp

ACCount37

muldvarp

yieldcrv

dahcryn

guluarte

nutjob2

westurner