\

Adaptive LLM routing under budget constraints

206 points - 09/01/2025

Source
  • pbd

    09/01/2025

    GPT-4 at $24.7 per million tokens vs Mixtral at $0.24 - that's a 100x cost difference! Even if routing gets it wrong 20% of the time, the economics still work. But the real question is how you measure 'performance' - user satisfaction doesn't always correlate with technical metrics.

      • FINDarkside

        09/01/2025

        It's trivial to get better score than GPT-4 with 1% of the cost by using my propertiary routing algorithm that routes all requests to Gemini 2.5 Flash. It's called GASP (Gemini Always, Save Pennies)

          • nutjob2

            09/01/2025

            Does anyone working in an individual capacity actually end up paying for Gemini (Flash or Pro)? Or does Google boil you like a frog and you end up subscribing?

              • baq

                09/02/2025

                If I actually had time to work on my hobby projects Gemini pro would be the first thing I’d spend money on. As is, it’s amazing how much progress you can squeeze out of those 5 chats every 24h; I can get a couple hours of before-times hacking done in 15 minutes, which is incidentally when free usage gets throttled and my free time runs out.

                • aspect8445

                  09/01/2025

                  I've used Gemini in a lot of personal projects. At this point I've probably made tens of thousands of requests, sometimes exceeding 1k per week. So far, I haven't had to pay a dime!

                    • worm00111

                      09/01/2025

                      How come you don't need to pay? Do you get it for free somehow?

                        • KETHERCORTEX

                          09/01/2025

                          There's free tier for API.

                            • drittich

                              09/02/2025

                              "When you use Unpaid Services, including, for example, Google AI Studio and the unpaid quota on Gemini API, Google uses the content you submit to the Services and any generated responses to provide, improve, and develop Google products and services and machine learning technologies, including Google's enterprise features, products, and services, consistent with our Privacy Policy.

                              To help with quality and improve our products, human reviewers may read, annotate, and process your API input and output. Google takes steps to protect your privacy as part of this process. This includes disconnecting this data from your Google Account, API key, and Cloud project before reviewers see or annotate it. Do not submit sensitive, confidential, or personal information to the Unpaid Services."

                              Reference: https://ai.google.dev/gemini-api/terms

                  • ivape

                    09/02/2025

                    You get 1500 prompts on AIStudio across a few Gemini flash models. I think I saw 250 or 500 for 2.5. It’s basically free and beats the consumer rate limits of big apps (Claude, ChatGPT, Gemini, meta). I wonder when they’ll cut this off.

                    • dcre

                      09/01/2025

                      I've paid a few dollars a month for my API usage for about 6 months.

              • simpaticoder

                09/01/2025

                PPT (price-per-token) is insufficient to compute cost. You will also need to know an average tokens-per-interaction (TPI). They multiply to give you a cost estimate. A .01x PPT is wiped out by 100x TPI.

                  • monsieurbanana

                    09/01/2025

                    Are you saying that some models will take 100x more tokens than other (models in the same ballpark) for the same task? Is the 100 a real measured metric or just random numbers to illustrate a point?

                      • simpaticoder

                        09/01/2025

                        With thinking models, yes 100x is not just possible, but probable. You get charged for the intermediate thinking tokens, even if you don't see them (which is the case for Grok, for example). And even if you do see them, they won't necessarily add value.

                          • monsieurbanana

                            09/03/2025

                            > With thinking models, yes 100x is not just possible, but probable

                            So the answer is no then, because I don't put reasoning and non-reasoning models in the same ballpark when it comes to token usage. You can just turn off reasoning.

                        • datadrivenangel

                          09/02/2025

                          the GPT 5 models use ~10x more tokens depending on the reasoning settings.

                  • Keyframe

                    09/01/2025

                    number of complaints / million tokens?

                    • mkoubaa

                      09/01/2025

                      > How you measure 'performance'

                      I heard the best way is through valuations

                      • pqtyw

                        09/01/2025

                        > GPT-4 at $24.7 per million tokens

                        While technically true why would you want to use it when OpenAI itself provides a bunch of many times cheaper and better models?

                          • KTibow

                            09/01/2025

                            RouterBench is from March 2024.

                    • QuadmasterXLII

                      09/01/2025

                      The framing in the headline is interesting. As far as I recall, spending 4x more compute on a model to improve performance by 7% is the move that has worked over and over again up to this point. 101 % of GPT-4 performance (potentially at any cost) is what I would expect an improved routing algorithm to achieve.

                        • dang

                          09/01/2025

                          (The submitted title was "93% of GPT-4 performance at 1/4 cost: LLM routing with weak bandit feedback")

                      • spoaceman7777

                        09/01/2025

                        Incredible that they are using contextual bandits, and named it: Preference-prior Informed Linucb fOr adaptive rouTing (PILOT)

                        Rather than the much more obvious: Preference-prior Informed Linucb For Adaptive Routing (PILFAR)

                          • bhickey

                            09/01/2025

                            That's pretty funny. I might need to pilfer it.

                        • fny

                          09/01/2025

                          Is there a reason human preference data is even needed? Don't LLMs already have a strong enough notion of question complexity to build a dataset for routing?

                            • delichon

                              09/01/2025

                              > a strong enough notion of question complexity

                              Aka Wisdom. No, LLMs don't have that. Me neither, I usually have to step in the rabbit holes in order to detect them.

                                • fny

                                  09/01/2025

                                  "Do you think you need to do high/medium/low amount of thinking to answer X?" seems well within an LLMs wheelhouse if the goal is to build an optimized routing engine.

                                    • nutjob2

                                      09/01/2025

                                      How do you think that an LLM could come by that information? Do you think that LLM vendors are logging performance and feeding that back into the model or some other mechanism?

                            • jibal

                              09/01/2025

                              LLMs don't have notions ... they are pattern matchers against a vast database of human text.

                                • mhh__

                                  09/01/2025

                                  Please do a SELECT * from this database

                                    • ashirviskas

                                      09/01/2025

                                      What was the name of the rocket that brought the first humans into space?

                                  • lillecarl

                                    09/01/2025

                                    [dead]

                                • imtringued

                                  09/02/2025

                                  This is like asking someone to make you a sandwich and expect them to read your mind to determine what kind of sandwich you want.

                              • hackathonguy

                                09/02/2025

                                I'm very curious whether a) anecdotally, anyone has encountered a real enterprise cost-cutting effort focused on LLM APIs and b) empirically, whether anyone has done any research on price elasticity in LLMs of different performance scales.

                                So far, my experience has been that it's just too early for most people / applications to worry about cost - at most, I've seen AI to be accountable for 10% of cloud costs. But very curious if others have other experiences.

                                  • dahcryn

                                    09/02/2025

                                    LLM is far from the highest AI related cost, so we basically don't care about optimizing LLMs.

                                    Obviously we don't use the super expensive ones like GPT4.5 or so. But we don't really bother with mini models, because GPT4.1 etc.. are cheap enough.

                                    Stuff like speech to text etc.. are still way more expensive, and yes there we do focus on cost optimization. We have no large scale image generation use cases (yet)

                                    • baq

                                      09/02/2025

                                      In which context? The serious engineering folks are still in exploration phase, costs is mostly not a concern as long as shipping velocity increases. Reselling repackaged tokens is a different beast, no experience here

                                  • lewtun

                                    09/01/2025

                                    > We instantiate this idea through Preference-prior Informed Linucb fOr adaptive rouTing (PILOT), a novel extension of LinUCB

                                    Academics are pretty creative at naming their creations

                                      • CuriouslyC

                                        09/01/2025

                                        I almost named my LoRA replacement BEMO, but that felt too cute, so it's just BEM (Bolt-on Expert Modules).

                                    • CuriouslyC

                                      09/01/2025

                                      These router papers are popping up hard now. I have a gradient boosted router I've been playing with that ties into retrieval to provide adaptive routing. The truth about these routers is that you have to tune them on your workloads to get the full benefit, otherwise they test way better than they work in production. That was why I added the retrieval aspect to mine, otherwise your top line slice and reality are very different.

                                      • danieltanfh95

                                        09/02/2025

                                        Unless your application is relatively trivial you would always want consistent behaviour as much as possible than some random metric that is used to proxy as "performance", routing is NOT the solution.

                                        • axiom92

                                          09/01/2025

                                          From last neurips https://automix-llm.github.io/automix/

                                          • andrewflnr

                                            09/01/2025

                                            Is this really the frontier of LLM research? I guess we really aren't getting AGI any time soon, then. It makes me a little less worried about the future, honestly.

                                            Edit: I never actually expected AGI from LLMs. That was snark. I just think it's notable that the fundamental gains in LLM performance seem to have dried up.

                                              • kenjackson

                                                09/01/2025

                                                First, I don't think we will ever get to AGI. Not because we won't see huge advances still, but AGI is a moving ambiguous target that we won't get consensus on.

                                                But why does this paper impact your thinking on it? It is about budget and recognizing that different LLMs have different cost structures. It's not really an attempt to improve LLM performance measured absolutely.

                                                  • ACCount37

                                                    09/01/2025

                                                    I can totally see "it's not really AGI because it doesn't consistently outperform those three top 0.000001% outlier human experts yet if they work together".

                                                    It'll be a while until the ability to move the goalposts of "actual intelligence" is exhausted entirely.

                                                      • 9dev

                                                        09/01/2025

                                                        Well right now, my niece of 7 years outperforms all LLM contenders in drawing a Pelican on a bicycle

                                                          • kenjackson

                                                            09/01/2025

                                                            I know this was a joke, but LLMs are quite good at this now. If your niece draws better then she’s a good artist.

                                                            • neuronexmachina

                                                              09/02/2025

                                                              I tried it in Gemini just now, it seems to have done a decent job: https://g.co/gemini/share/b6fef8398c01

                                                      • _heimdall

                                                        09/01/2025

                                                        So you don't expect AGI to be possible ever? Or is your concern mainly with the wildly different definitions people use for it and that we'll continue moving goal posts rather than agree we got there?

                                                          • nutjob2

                                                            09/01/2025

                                                            There's no concrete evidence AGI is possible mostly because it has no concrete definition.

                                                            It's mostly hand waving, hype and credulity, and unproven claims of scalability right now.

                                                            You can't move the goal posts because they don't exist.

                                                              • _heimdall

                                                                09/02/2025

                                                                Got it, and yeah I agree with you there. I've been frustrated by a different view of it though, many people seem to have a definition and they are often wildly different.

                                                                • dahcryn

                                                                  09/02/2025

                                                                  even AI does not have a concrete definition.

                                                                  Doesn't mean there aren't practical definitions depending on the context.

                                                                  In essence, teaching an AI using recources meant for humans, and nothing more, would be considered AGI. That could be a practical definition, without needing much more rigour.

                                                                  There is indeed no evidence we'll get there. But there is also no evidence LLM's should work as well as they do

                                                                  • ashirviskas

                                                                    09/01/2025

                                                                    Well, if a human is GI, we just need to make it Artificial. Easy.

                                                                      • abalashov

                                                                        09/02/2025

                                                                        I like to say that it's not AI -- it's just A.

                                                            • baq

                                                              09/02/2025

                                                              Given OpenAI definition I’d expect AGI to be around in a decade or two. I don’t expect skynet, though maybe it’s a more realistic vision outcome that just droids mixing with humans.

                                                          • jibal

                                                            09/01/2025

                                                            LLMs are not on the road to AGI, but there are plenty of dangers associated with them nonetheless.

                                                              • andrewflnr

                                                                09/01/2025

                                                                Agreed, broadly. I never really thought they were, but seeing people work on stuff like this instead of even trying to improve the architecture really makes it obvious.

                                                                • nicce

                                                                  09/01/2025

                                                                  Just 2 days ago Gemini 2.5 Pro tried to recommend me tax evasion based on non-existing laws and court decisions. The model was so charming and convincing, that even after I brought all the logic flaws and said that this is plain wrong, I started to doubt myself, because it is so good at pleasing, arguing and using words.

                                                                  And most would have accept the recommendation because the model sold it as less common tactic, while sounding very logical.

                                                                    • nutjob2

                                                                      09/01/2025

                                                                      Or you could understand the tool you are using and be skeptical of any of its output.

                                                                      So many people just want to believe, instead of the reality of LLMs being quite unreliable.

                                                                      Personally it's usually fairly obvious to me when LLMs are bullshitting probably because I have lots of experience detecting it in humans.

                                                                        • nicce

                                                                          09/01/2025

                                                                          LLM is only useful if it gives shortcut to information with reasonable accuracy. If I need to double check everything, it is just extra step.

                                                                          In this case I just happened to be domain expert and knew it was wrong. It would have required significant effort to verify everything with some less experienced person.

                                                                      • roywiggins

                                                                        09/01/2025

                                                                        > even after I brought all the logic flaws and said that this is plain wrong

                                                                        Once you've started to argue with an LLM you're already barking up the wrong tree. Maybe you're right, maybe not, but there's no point in arguing it out with an LLM.

                                                                          • nicce

                                                                            09/01/2025

                                                                            There are cases when they are actually correct, instead of the human.

                                                                              • roywiggins

                                                                                09/01/2025

                                                                                Yes, and there's a substantial chance they'll apologize to you anyway even when they were right. There's no reason to expect them to be more likely to apologize when they're actually right vs actually wrong- their agreeableness is really orthogonal to their correctness.

                                                                                  • nicce

                                                                                    09/01/2025

                                                                                    Yes, they over-apologize. But my main reason for using LLMs is seeking out things that I missed myself or my own argumentation was not good. Sometimes they are really good at bringing new perspectives. Whether they are correct or incorrect is not the point - are they giving argument or perspective that is worth inspecting more with my own brains?

                                                                • ctoth

                                                                  09/01/2025

                                                                  Is a random paper from Fujitsu Research claiming to be the frontier of anything?

                                                                    • andrewflnr

                                                                      09/01/2025

                                                                      Not just this paper, but model working shenanigans also seem to have been a big part of GPT-5, which certainly claims to be frontier work.

                                                                  • srekhi

                                                                    09/01/2025

                                                                    I'm not following this either. You'd think this would be frontier back in 2023

                                                                    • yahoozoo

                                                                      09/01/2025

                                                                      That and LLMs are seemingly plateauing. Earlier this year, it seemed like the big companies were releasing noticeable improvements every other week. People would joke a few weeks is “an eternity” in AI…so what time span are we looking at now?

                                                                        • 09/01/2025

                                                                          • andrewflnr

                                                                            09/01/2025

                                                                            That's just the thing. There don't seem to have been any breakthroughs in model performance or architecture, so it seems like we're back to picking up marginal reductions in cost to make any progress.

                                                                            • muldvarp

                                                                              09/01/2025

                                                                              There have been very large improvements in code generation in the last 6 months. A few weeks without improvement are not necessarily a plateau.

                                                                                • ACCount37

                                                                                  09/01/2025

                                                                                  Wait until it ramps up so much that people will say "it's a plateau, for real this time" when they go 3 days without a +10% capability jump.

                                                                                    • muldvarp

                                                                                      09/01/2025

                                                                                      I mean I wish there were a plateau, without one we're well onto our way into techno-feudalism. I just don't see it.

                                                                                        • ACCount37

                                                                                          09/01/2025

                                                                                          That's what it is: wishful thinking. A lot of people really, really want AI tech to fail - because they don't like the alternative.

                                                                                            • muldvarp

                                                                                              09/02/2025

                                                                                              Yeah, obviously nobody that actually though about the consequences wants a large part of the population to become unemployed. Even if your job is not threatened by automation, it will be threatened by a lot of people looking for new jobs.

                                                                                              And the kind of automation brought by LLMs is decidely different than automation in the past which almost always created new (usually better) jobs. LLMs won't do this (at least to extent where it would matter) I think. Most people in ten years will have worse jobs (more physically straining, longer hours, less pay) unless there will be a political intervention.

                                                                          • yieldcrv

                                                                            09/01/2025

                                                                            just because it’s on arxiv doesn’t mean anything

                                                                            arxiv is essentially a blog under an academic format, popular amongst asian and south asian academic communities

                                                                            currently you can launder reputation with it, just like “white papers” in the crypto world allowed for capital for some time

                                                                            this ability will diminish as more people catch on

                                                                              • dahcryn

                                                                                09/02/2025

                                                                                arxiv should really have a big red banner "NOT REVIEWED - DON'T USE AS A SOURCE" or something

                                                                            • guluarte

                                                                              09/01/2025

                                                                              I'm starting to think that there will not be an 'AGI' moment, we will simply slowly build smarter machines over time until we realize there is 'AGI'. It would be like video calls in the '90s everybody wanted them, now everybody hates them, lmao.

                                                                                • nutjob2

                                                                                  09/01/2025

                                                                                  Or we'll realize that human intelligence and machine intelligence is apple and oranges.

                                                                          • westurner

                                                                            09/01/2025

                                                                            Would there be advantages to routing to models according to cost in conjunction with prompt rewriting?

                                                                            • valentinammm

                                                                              09/01/2025

                                                                              [dead]