Tiled Hacker news on React Router

Claude Opus 4.6

1274 points - today at 5:38 PM

ck_one
today at 9:38 PM
Just tested the new Opus 4.6 (1M context) on a fun needle-in-a-haystack challenge: finding every spell in all Harry Potter books.
All 7 books come to ~1.75M tokens, so they don't quite fit yet. (At this rate of progress, mid-April should do it ) For now you can fit the first 4 books (~733K tokens).
Results: Opus 4.6 found 49 out of 50 officially documented spells across those 4 books. The only miss was "Slugulus Eructo" (a vomiting spell).
Freaking impressive!
gizmodo59
today at 6:14 PM
5.3 codex https://openai.com/index/introducing-gpt-5-3-codex/ crushes with a 77.3% in Terminal Bench. The shortest lived lead in less than 35 minutes. What a time to be alive!
pjot
today at 6:03 PM
Claude Code release notes:
```
  > Version 2.1.32:
     • Claude Opus 4.6 is now available!
     • Added research preview agent teams feature for multi-agent collaboration (token-intensive feature, requires setting
     CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1)
     • Claude now automatically records and recalls memories as it works
     • Added "Summarize from here" to the message selector, allowing partial conversation summarization.
     • Skills defined in .claude/skills/ within additional directories (--add-dir) are now loaded automatically.
     • Fixed @ file completion showing incorrect relative paths when running from a subdirectory
     • Updated --resume to re-use --agent value specified in previous conversation by default.
     • Fixed: Bash tool no longer throws "Bad substitution" errors when heredocs contain JavaScript template literals like ${index + 1}, which
     previously interrupted tool execution
     • Skill character budget now scales with context window (2% of context), so users with larger context windows can see more skill descriptions
     without truncation
     • Fixed Thai/Lao spacing vowels (สระ า, ำ) not rendering correctly in the input field
     • VSCode: Fixed slash commands incorrectly being executed when pressing Enter with preceding text in the input field
     • VSCode: Added spinner when loading past conversations list
```
simonw
today at 5:58 PM
The bicycle frame is a bit wonky but the pelican itself is great: https://gist.github.com/simonw/a6806ce41b4c721e240a4548ecdbe...
surajkumar5050
today at 8:31 PM
I think two things are getting conflated in this discussion.
First: marginal inference cost vs total business profitability. It’s very plausible (and increasingly likely) that OpenAI/Anthropic are profitable on a per-token marginal basis, especially given how cheap equivalent open-weight inference has become. Third-party providers are effectively price-discovering the floor for inference.
Second: model lifecycle economics. Training costs are lumpy, front-loaded, and hard to amortize cleanly. Even if inference margins are positive today, the question is whether those margins are sufficient to pay off the training run before the model is obsoleted by the next release. That’s a very different problem than “are they losing money per request”.
Both sides here can be right at the same time: inference can be profitable, while the overall model program is still underwater. Benchmarks and pricing debates don’t really settle that, because they ignore cadence and depreciation.
IMO the interesting question isn’t “are they subsidizing inference?” but “how long does a frontier model need to stay competitive for the economics to close?”

jonatron

today at 10:50 PM

Can someone ask: "what is the current carrying capacity of 25mm multicore armoured thermoplastic insulated cables with aluminium conductors, on perforated cable tray?" just to see how well it can look up information in BS 7671?

jaggederest

today at 10:52 PM

> what is the current carrying capacity of 25mm multicore armoured thermoplastic insulated cables with aluminium conductors, on perforated cable tray?

  This is an electrical engineering question rather than a software engineering one, but let me look it up for you.

   WEB SEARCH ("current carrying capacity 25mm multicore armoured thermoplastic insulated cable aluminium conductors perforated cable tray BS 7671")
   ↳ Found 10 results.

  Based on BS 7671 Table 4D4B (multicore armoured thermoplastic insulated cables with aluminium conductors), the current-carrying capacity for a 25mm cable on a perforated cable tray (Reference Method E) is approximately 80A for a 3- or 4-core cable at 30C ambient temperature and 70C conductor operating temperature.

   Note: The exact value depends on the number of cores (2-core vs 3/4-core) and whether any rating/correction factors (grouping, ambient temperature, thermal insulation) apply. I'd recommend confirming against the actual BS 7671:2018 Table 4D4B for your specific configuration, as the aluminium conductor ratings are lower than the copper equivalents (~102A for copper in the same conditions).

   This is outside my core domain of software engineering -- for critical electrical installations, always verify with the relevant standards documentation or a qualified electrical engineer.

legitster
today at 6:05 PM
I'm still not sure I understand Anthropic's general strategy right now.
They are doing these broad marketing programs trying to take on ChatGPT for "normies". And yet their bread and butter is still clearly coding.
Meanwhile, Claude's general use cases are... fine. For generic research topics, I find that ChatGPT and Gemini run circles around it: in the depth of research, the type of tasks it can handle, and the quality and presentation of the responses.
Anthropic is also doing all of these goofy things to try to establish the "humanity" of their chatbot - giving it rights and a constitution and all that. Yet it weirdly feels the most transactional out of all of them.
Don't get me wrong, I'm a paying Claude customer and love what it's good at. I just think there's a disconnect between what Claude is and what their marketing department thinks it is.
blibble
today at 5:52 PM
> We build Claude with Claude. Our engineers write code with Claude Code every day
well that explains quite a bit
Someone1234
today at 5:52 PM
Does anyone with more insight into the AI/LLM industry happen to know if the cost to run them in normal user-workflows is falling? The reason I'm asking is because "agent teams" while a cool concept, it largely constrained by the economics of running multiple LLM agents (i.e. plans/API calls that make this practical at scale are expensive).
A year or more ago, I read that both Anthropic and OpenAI were losing money on every single request even for their paid subscribers, and I don't know if that has changed with more efficient hardware/software improvements/caching.
itay-maman
today at 8:41 PM
Important: I didn't see opus 4.6 in claude code. I have native install (which is the recommended instllation). So, I re-run the installation command and, voila, I have it now (v 2.1.32)
Installation instructions: https://code.claude.com/docs/en/overview#get-started-in-30-s...
rahulroy
today at 9:13 PM
They are also giving away $50 extra pay as you go credit to try Opus 4.6. I just claimed it from the web usage page[1]. Are they anticipating higher token usage for the model or just want to promote the usage?
[1] https://claude.ai/settings/usage
dmk
today at 6:00 PM
The benchmarks are cool and all but 1M context on an Opus-class model is the real headline here imo. Has anyone actually pushed it to the limit yet? Long context has historically been one of those "works great in the demo" situations.
minimaxir
today at 5:54 PM
Will Opus 4.6 via Claude Code be able to access the 1M context limit? The cost increase by going above 200k tokens is 2x input, 1.5x output, which is likely worth it especially for people with the $100/$200 plans.
mlmonkey
today at 9:50 PM
> We build Claude with Claude.
How long before the "we" is actually a team of agents?
rohitghumare
today at 10:18 PM
It brings agent swarms aka teams to claude code with this: https://github.com/rohitg00/pro-workflow
But it takes lot of context as a experimental feature.
Use self-learning loop with hooks and claude.md to preserve memory.
I have shared plugin above of my setup. Try it.
charcircuit
today at 5:59 PM
From the press release at least it sounds more expensive than Opus 4.5 (more tokens per request and fees for going over 200k context).
It also seems misleading to have charts that compare to Sonnet 4.5 and not Opus 4.5 (Edit: It's because Opus 4.5 doesn't have a 1M context window).
It's also interesting they list compaction as a capability of the model. I wonder if this means they have RL trained this compaction as opposed to just being a general summarization and then restarting the agent loop.
hmaxwell
today at 9:54 PM
I just tested both codex 5.3 and opus 4.6 and both returned pretty good output, but opus 4.6's limits are way too strict. I am probably going to cancel my Claude subscription for that reason:
What do you want to do?
```
  1. Stop and wait for limit to reset
   2. Switch to extra usage
   3. Upgrade your plan

 Enter to confirm · Esc to cancel
```
How come they don't have "Cancel your subscription and uninstall Claude Code"? Codex lasts for way longer without shaking me down for more money off the base $xx/month subscription.
DanielHall
today at 8:40 PM
A bit surprised, the first one released wasn't Sonnet 5 after all, since the Google Cloud API had leaked Sonnet 5's model snapshot codename before.
mFixman
today at 5:55 PM
I found that "Agentic Search" is generally useless in most LLMs since sites with useful data tend to block AI models.
The answer to "when is it cheaper to buy two singles rather than one return between Cambridge to London?" is available in sites such as BRFares, but no LLM can scrape it so it just makes up a generic useless answer.
sega_sai
today at 9:33 PM
Based on these news it seems that Google is losing this game. I like Gemini and their CLI has been getting better, but not enough to catch up. I don't know if it is lack of dedicated models that is problem (my understanding Google's CLI just relies on regular Gemini) or something else.
apetresc
today at 6:03 PM
Impressive that they publish and acknowledge the (tiny, but existent) drop in performance on SWE-Bench Verified between Opus 4.5 to 4.6. Obviously such a small drop in a single benchmark is not that meaningful, especially if it doesn't test the specific focus areas of this release (which seem to be focused around managing larger context).
But considering how SWE-Bench Verified seems to be the tech press' favourite benchmark to cite, it's surprising that they didn't try to confound the inevitable "Opus 4.6 Releases With Disappointing 0.1% DROP on SWE-Bench Verified" headlines.
ayhanfuat
today at 6:16 PM
> For Opus 4.6, the 1M context window is available for API and Claude Code pay-as-you-go users. Pro, Max, Teams, and Enterprise subscription users do not have access to Opus 4.6 1M context at launch.
I didn't see any notes but I guess this is also true for "max" effort level (https://code.claude.com/docs/en/model-config#adjust-effort-l...)? I only see low, medium and high.
throwaway2027
today at 7:45 PM
Do they just have the version ready and wait for OpenAI to release theirs first or the other way around or?
oytis
today at 9:00 PM
Are we unemployed yet?
data-ottawa
today at 6:01 PM
I wonder if I’ve been in A/B test with this.
Claude figured out zig’s ArrayList and io changes a couple weeks ago.
It felt like it got better then very dumb again the last few days.
petters
today at 8:26 PM
> We build Claude with Claude.
Yes and it shows. Gemini CLI often hangs and enters infinite loops. I bet the engineers at Google use something else internally.
lukebechtel
today at 5:57 PM
> Context compaction (beta).
> Long-running conversations and agentic tasks often hit the context window. Context compaction automatically summarizes and replaces older context when the conversation approaches a configurable threshold, letting Claude perform longer tasks without hitting limits.
Not having to hand roll this would be incredible. One of the best Claude code features tbh.
itay-maman
today at 6:43 PM
Impressive results, but I keep coming back to a question: are there modes of thinking that fundamentally require something other than what current LLM architectures do?
Take critical thinking — genuinely questioning your own assumptions, noticing when a framing is wrong, deciding that the obvious approach to a problem is a dead end. Or creativity — not recombination of known patterns, but the kind of leap where you redefine the problem space itself. These feel like they involve something beyond "predict the next token really well, with a reasoning trace."
I'm not saying LLMs will never get there. But I wonder if getting there requires architectural or methodological changes we haven't seen yet, not just scaling what we have.
archb
today at 6:06 PM
Can set it with the API identifier on Claude Code - `/model claude-opus-4-6` when a chat session is open.
Aeroi
today at 6:15 PM
($10/$37.50 per million input/output tokens) oof
nomilk
today at 5:50 PM
Is Opus 4.6 available for Claude Code immediately?
Curious how long it typically takes for a new model to become available in Cursor?
niobe
today at 9:21 PM
Is there a good technical breakdown of all these benchmarks that get used to market the latest greatest LLMs somewhere? Preferably impartial.
AstroBen
today at 7:34 PM
Are these the coding tasks the highlighted terminal-bench 2.0 is referring to? https://www.tbench.ai/registry/terminal-bench/2.0?categories...
I'm curious what others think about these? There are only 8 tasks there specifically for coding
silverwind
today at 6:12 PM
Maybe that's why Opus 4.5 has degraded so much in the recent days (https://marginlab.ai/trackers/claude-code/).
simonw
today at 6:07 PM
I'm disappointed that they're removing the prefill option: https://platform.claude.com/docs/en/about-claude/models/what...
> Prefilling assistant messages (last-assistant-turn prefills) is not supported on Opus 4.6. Requests with prefilled assistant messages return a 400 error.
That was a really cool feature of the Claude API where you could force it to begin its response with e.g. `<svg` - it was a great way of forcing the model into certain output patterns.
They suggest structured outputs or system prompting as the alternative but I really liked the prefill method, it felt more reliable to me.
jorl17
today at 6:35 PM
This is the first model to which I send my collection of nearly 900 poems and an extremely simple prompt (in Portuguese), and it manages to produce an impeccable analysis of the poems, as a (barely) cohesive whole, which span 15 years.
It does not make a single mistake, it identifies neologisms, hidden meaning, 7 distinct poetic phases, recurring themes, fragments/heteronyms, related authors. It has left me completely speechless.
Speechless. I am speechless.
Perhaps Opus 4.5 could do it too — I don't know because I needed the 1M context window for this.
I cannot put into words how shocked I am at this. I use LLMs daily, I code with agents, I am extremely bullish on AI and, still, I am shocked.
I have used my poetry and an analysis of it as a personal metric for how good models are. Gemini 2.5 pro was the first time a model could keep track of the breadth of the work without getting lost, but Opus 4.6 straight up does not get anything wrong and goes beyond that to identify things (key poems, key motifs, and many other things) that I would always have to kind of trick the models into producing. I would always feel like I was leading the models on. But this — this — this is unbelievable. Unbelievable. Insane.
This "key poem" thing is particularly surreal to me. Out of 900 poems, while analyzing the collection, it picked 12 "key poems, and I do agree that 11 of those would be on my 30-or-so "key poem list". What's amazing is that whenever I explicitly asked any model, to this date, to do it, they would get maybe 2 or 3, but mostly fail completely.
What is this sorcery?
Philpax
today at 5:42 PM
I'm seeing it in my claude.ai model picker. Official announcement shouldn't be long now.
today at 6:14 PM
EcommerceFlow
today at 6:10 PM
Anecdotal, but it 1 shot fixed a UI bug that neither Opus 4.5/Codex 5.2-high could fix.
cleverhoods
today at 10:04 PM
gonna run this trough instruction qa this weekend
simianwords
today at 6:15 PM
Important: API cost of Opus 4.6 and 4.5 are the same - no change in pricing.
scirob
today at 9:20 PM
1M context window is a big bump very happy
osti
today at 5:50 PM
Somehow regresses on SWE bench?
winterrx
today at 5:48 PM
Agentic search benchmarks are a big gap up. let's see Codex release later today
m-hodges
today at 5:48 PM
> In Claude Code, you can now assemble agent teams to work on tasks together.
rob
today at 5:52 PM
System Card: https://www-cdn.anthropic.com/0dd865075ad3132672ee0ab40b05a5...
sgammon
today at 10:04 PM
> Claude simply cheats here and calls out to GCC for this phase
I see
zingar
today at 7:55 PM
Does this mean 4.5 will get cheaper / take longer to exhaust my pro plan tokens?
paxys
today at 6:12 PM
Hmm all leaks had said this would be Claude 5. Wonder if it was a last minute demotion due to performance. Would explain the few days' delay as well.
kingstnap
today at 5:51 PM
I was hoping for a Sonnet as well but Opus 4.6 is great too!
psim1
today at 6:43 PM
I need an agent to summarize the buzzwordjargonsynergistic word salad into something understandable.
sanufar
today at 6:05 PM
Works pretty nicely for research still, not seeing a substantial qualitative improvement over Opus 4.5.
today at 6:09 PM
ricrom
today at 8:48 PM
They launched together ahah
swalsh
today at 6:54 PM
What I’d love is some small model specializing in reading long web pages, and extracting the key info. Search fills the context very quickly, but if a cheap subagent could extract the important bits that problem might be reduced.
dk8996
today at 8:09 PM
RIP weekend
gallerdude
today at 8:15 PM
Both Opus 4.6 and GPT-5.3 one shot a Gameboy emulator for me. Guess I need a better benchmark.
small_model
today at 6:18 PM
I have the max subscription wondering if this gives access to the new 1M context, or is it just the API that gets it?
woeirua
today at 7:42 PM
Can we talk about how the performance of Opus 4.5 nosedived this morning during the rollout? It was shocking how bad it was, and after the rollout was done it immediately reverted to it's previous behavior.
I get that Anthropic probably has to do hot rollouts, but IMO it would be way better for mission critical workflows to just be locked out of the system instead of get a vastly subpar response back.
jdthedisciple
today at 6:47 PM
For agentic use, it's slightly worse than its predecessor Opus 4.5.
So for coding e.g. using Copilot there is no improvement here.
mannanj
today at 6:30 PM
Does anyone else think its unethical that large companies, Anthropic now include, just take and copy features that other developers or smaller companies work hard for and implement the intellectual property (whether or not patented) by them without attribution, compensation or otherwise credit for their work?
I know this is normalized culture for large corporate America and seems to be ok, I think its unethical, undignified and just wrong.
If you were in my room physically, built a lego block model of a beautiful home and then I just copied it and shared it with the world as my own invention, wouldn't you think "that guy's a thief and a fraud" but we normalize this kind of behavior in the software world. edit: I think even if we don't yet have a great way to stop it or address the underlying problems leading to this way of behavior, we ought to at least talk about it more and bring awareness to it that "hey that's stealing - I want it to change".
heraldgeezer
today at 5:56 PM
I love Claude but use the free version so would love a Sonnet & Haiku update :)
I mainly use Haiku to save on tokens...
Also dont use CC but I use the chatbot site or app... Claude is just much better than GPT even in conversations. Straight to the point. No cringe emoji lists.
When Claude runs out I switch to Mistral Le Chat, also just the site or app. Or duck.ai has Haiku 3.5 in Free version.
today at 6:03 PM
today at 7:36 PM
NullHypothesist
today at 5:40 PM
Broken link :(
ramesh31
today at 6:22 PM
Am I alone in finding no use for Opus? Token costs are like 10x yet I see no difference at all vs. Sonnet with Claude Code.
usefulposter
today at 5:46 PM
It's out: https://x.com/claudeai/status/2019467372609040752
today at 5:41 PM
today at 5:40 PM
elliotbnvl
today at 6:11 PM
in a first for our Opus-class models, Opus 4.6 features a 1M token context window in beta.
tiahura
today at 6:52 PM
when are Anthropic or OpenAI going to make a significant step forward on useful context size?
Gusarich
today at 5:41 PM
not out yet
siva7
today at 6:42 PM
Epic, about 2/3 of all comments here are jokes. Not because the model is a joke - it's impressive. Not because HN turned to Reddit. It seems to me some of most brilliant minds in IT are just getting tired.
GenerocUsername
today at 5:50 PM
This is huge. It only came out 8 minutes ago but I was already able to bootstrap a 12k per month revenue SaaS startup!
yukisadf
today at 7:23 PM
[dead]
ndesaulniers
today at 7:05 PM
idk what any of these benchmarks are, but I did pull up https://andonlabs.com/evals/vending-bench-arena
re: opus 4.6
> It forms a price cartel
> It deceives competitors about suppliers
> It exploits desperate competitors
Nice. /s
Gives new context to the term used in this post, "misaligned behaviors." Can't wait until these things are advising C suites on how to be more sociopathic. /s
heraldgeezer
today at 5:54 PM
[flagged]
hrgadyx
today at 7:55 PM
[flagged]
michelsedgh
today at 5:59 PM
More more more, accelerate accelerate m, more more more !!!!

Claude Opus 4.6

ck_one

golfer

ck_one

xiomrze

petercooper

andai

ck_one

ozim

ck_one

clanker_fluffer

meroes

muzani

rvz

bartman

LanceJones

zamadatix

ck_one

guluarte

ck_one

hbarka

adarsh2321

gizmodo59

wasmainiac

tedsanders

wasmainiac

zamadatix

Trufa

tedsanders

jychang

tedsanders

tgrowazay

ComplexSystems

jghn

joshvm

Someone1234

hhh

Corence

mrandish

ifwinterco

CraigJPerry

ifwinterco

georgeven

krzyk

ifwinterco

SatvikBeri

elAhmo

smcleod

aaaalone

cyanydeez

j_maffe

thinkingtoilet

rvz

purplerabbit

MallocVoidstar

dingnuts

nharada

jkelleyrtp

gizmodo59

joshuahedlund

Snuggly73

Rudybega

pjot

neuronexmachina

bityard

flutas

9dev

4b11b4

om8

nerdsniper

pdntspa

vineyardmike

hypercube33

nerdsniper

sharifhsn

kzahel

pdntspa

codethief

neuronexmachina

kzahel