Tiled Hacker news on React Router

Speed up responses with fast mode

71 points - today at 6:08 PM

Source

kristianp
today at 10:38 PM
This is gold for Anthropic's profitability. The Claude Code addicts can double their spend to plow through tokens because they need to finish something by a deadline. OpenAI will have a similar product within a week but will only charge 3x the normal rate.
This angle might also be NVidias reason for buying Groq. People will pay a premium for faster tokens.
Nition
today at 7:27 PM
Note that you can't use this mode to get the most out of a subscription - they say it's always charged as extra usage:
> Fast mode usage is billed directly to extra usage, even if you have remaining usage on your plan. This means fast mode tokens do not count against your plan’s included usage and are charged at the fast mode rate from the first token.
Although if you visit the Usage screen right now, there's a deal you can claim for $50 free extra usage this month.
paxys
today at 8:02 PM
Looking at the "Decide when to use fast mode", it seems the future they want is:
- Long running autonomous agents and background tasks use regular processing.
- "Human in the loop" scenarios use fast mode.
Which makes perfect sense, but the question is - does the billing also make sense?
jawon
today at 9:41 PM
I was thinking about inhouse model inference speeds at frontier labs like Anthropic and OpenAI after reading the "Claude built a C compiler" article.
Having higher inference speed would be an advantage, especially if you're trying to eat all the software and services.
Anthropic offering 2.5x makes me assume they have 5x or 10x themselves.
In the predicted nightmare future where everything happens via agents negotiating with agents, the side with the most compute, and the fastest compute, is going to steamroll everyone.
IMTDb
today at 7:28 PM
I’m curious what’s behind the speed improvements. It seems unlikely it’s just prioritization, so what else is changing? Is it new hardware (à la Groq or Cerebras)? That seems plausible, especially since it isn’t available on some cloud providers.
Also wondering whether we’ll soon see separate “speed” vs “cleverness” pricing on other LLM providers too.
rustyhancock
today at 9:26 PM
At this point why don't we just CNAME HN to the Claude marketing blog?
dmix
today at 10:07 PM
I really like Anthropic's web design. This doc site looks like it's using gitbook (or a clone of gitbook) but they make it look so nice.
clbrmbr
today at 7:53 PM
I’d love to hear from engineers who find that faster speed is a big unlock for them.
The deadline piece is really interesting. I suppose there’s a lot of people now who are basically limited by how fast their agents can run and on very aggressive timelines with funders breathing down their necks?
simonw
today at 7:18 PM
The one question I have that isn't answered by the page is how much faster?
Obviously they can't make promises but I'd still like a rough indication of how much this might improve the speed of responses.
l5870uoo9y
today at 8:05 PM
It doesn’t say how much faster it is but from my experience with OpenAI’s “service_tier=priority” option on SQLAI.ai is that it’s twice as fast.
niobe
today at 9:23 PM
So fast mode uses more tokens, in direct opposition to Gemini where fast 'mode' means less. One more piece of useless knowledge to remember.
jhack
today at 7:50 PM
The pricing on this is absolutely nuts.
pronik
today at 7:26 PM
While it's an excellent way to make more money in the moment, I think this might become a standard no-extra-cost feature in several months (see Opus becoming way cheaper and a default model within months). Mental load management while using agents will become even more important it seems.
1123581321
today at 7:07 PM
Could be a use for the $50 extra usage credit. It requires extra usage to be enabled.
> Fast mode usage is billed directly to extra usage, even if you have remaining usage on your plan. This means fast mode tokens do not count against your plan’s included usage and are charged at the fast mode rate from the first token.
maz1b
today at 7:57 PM
AFAIK, they don't have any deals or partnerships with Groq or Cerebras or any of those kinds of companies.. so how did they do this?
simianwords
today at 8:21 PM
Whatever optimisation is going on is at the hardware level since the fast option persists in a session.
esafak
today at 8:00 PM
It's a good way to address the price insensitive segment. As long as they don't slow down the rest, good move.
pedropaulovc
today at 7:25 PM
Where is this perf gain coming from? Running on TPUs?
krm01
today at 7:22 PM
Will this mean that when cost is more important than latency that replies will now take longer?
I’m not in favor of the ad model chatgpt proposes. But business models like these suffer from similar traps.
If it works for them, then the logical next step is to convert more to use fast mode. Which naturally means to slow things down for those that didn’t pick/pay for fast mode.
We’ve seen it with iPhones being slowed down to make the newer model seem faster.
Not saying it’ll happen. I love Claude. But these business models almost always invite dark patterns in order to move the bottom line.
jonplackett
today at 10:06 PM
Is this is the beginning of the ‘Speedy boarding’ / ‘Fastest delivery’ enshitification?
Where everyone is forced to pay for a speed up because the ‘normal’ service just gets slower and slower.
I hope not. But I fear.
solidasparagus
today at 7:30 PM
I pay $200 a month and don't get any included access to this? Ridiculous
thisisauserid
today at 9:33 PM
Instead of better/cheaper/faster you just the the last one?
Back to Gemini.
AnotherGoodName
today at 9:31 PM
But waiting for the agent to finish is my 2026 equivalent of "compiling!"
https://xkcd.com/303/
hmokiguess
today at 7:41 PM
Give me a slow mode that’s cheaper instead lol
thehamkercat
today at 6:39 PM
Interesting, output price is insane/Mtok
speedping
today at 7:23 PM
> $30/150 MTok Umm no thank you
henning
today at 10:13 PM
LLM programming is very easy. First you have to prompt it to not mistakes. Then you have to tell it to go fast. Software engineering is over bro, all humans will be replaced in 6 days bro
aabhay
today at 9:16 PM
What is “$30/150MTok”? Claude Opus 4.6 is normally priced at “$25/MTok”. Am I just reading it wrong or is this a typo?
EDIT: I understand now. $30 for input, $150 for output. Very confusing wording. That’s insanely expensive!

Speed up responses with fast mode

kristianp

Nition

paxys

jawon

Aurornis

crowbahr

jawon

jawon

stavros

falloutx

Aurornis

falloutx

Aurornis

falloutx

kolinko

falloutx

throw310822

falloutx

throw310822

falloutx

throw310822

IMTDb

kingstnap

sothatsit

jstummbillig

AnotherGoodName

singpolyma3

sothatsit

servercobra

Nition

sothatsit

re-thc

pshirshov

rustyhancock

dmix

falloutx

treycluff

clbrmbr

Aurornis

throw310822

sothatsit

simonw

scosman

l1n

zurfer

simonwsucks

l5870uoo9y

niobe

Aurornis

Sol-

jhack

nick49488171

snowfield

nick49488171

input_sh

pronik

falloutx

giancarlostoro

1123581321

minimaxir

arcanemachiner

maz1b

tcdent

hendersoon

simianwords

esafak

pedropaulovc

AnotherGoodName

krm01

jonplackett

falloutx

solidasparagus

pedropaulovc

bakugo

MuffinFlavored

kingforaday

behindsight

sothatsit

thisisauserid