\

Claude Opus 4.6

1274 points - today at 5:38 PM

Source
  • ck_one

    today at 9:38 PM

    Just tested the new Opus 4.6 (1M context) on a fun needle-in-a-haystack challenge: finding every spell in all Harry Potter books.

    All 7 books come to ~1.75M tokens, so they don't quite fit yet. (At this rate of progress, mid-April should do it ) For now you can fit the first 4 books (~733K tokens).

    Results: Opus 4.6 found 49 out of 50 officially documented spells across those 4 books. The only miss was "Slugulus Eructo" (a vomiting spell).

    Freaking impressive!

      • golfer

        today at 10:41 PM

        There's lots of websites that list the spells. It's well documented. Could Claude simply be regurgitating knowledge from the web? Example:

        https://harrypotter.fandom.com/wiki/List_of_spells

          • ck_one

            today at 10:44 PM

            It didn't use web search. But for sure it has some internal knowledge already. It's not a perfect needle in the hay stack problem but gemini flash was much worse when I tested it last time.

        • xiomrze

          today at 10:06 PM

          Honest question, how do you know if it's pulling from context vs from memory?

          If I use Opus 4.6 with Extended Thinking (Web Search disabled, no books attached), it answers with 130 spells.

            • petercooper

              today at 10:18 PM

              One possible trick could be to search and replace them all with nonsense alternatives then see if it extracts those.

                • andai

                  today at 10:29 PM

                  That might actually boost performance since attention pays attention to stuff that stands out. If I make a typo, the models often hyperfixate on it.

              • ck_one

                today at 10:34 PM

                When I tried it without web search so only internal knowledge it missed ~15 spells.

                • ozim

                  today at 10:24 PM

                  Exactly there was this study where they were trying to make LLM reproduce HP book word for word like giving first sentences and letting it cook.

                  Basically they managed with some tricks make 99% word for word - tricks were needed to bypass security measures that are there in place for exactly reason to stop people to retrieve training material.

                    • ck_one

                      today at 10:35 PM

                      Do you remember how to get around those tricks?

                  • clanker_fluffer

                    today at 10:16 PM

                    What was your prompt?

                • meroes

                  today at 10:06 PM

                  What is this supposed to show exactly? Those books have been feed into LLMs for years and there's even likely specific RLHF's on extracting spells from HP.

                    • muzani

                      today at 10:33 PM

                      There was a time when I put the EA-Nasir text into base64 and asked AI to convert it. Remarkably it identified the correct text but pulled the most popular translation of the text than the one I gave it.

                      • rvz

                        today at 10:24 PM

                        > What is this supposed to show exactly?

                        Nothing.

                        You can be sure that this was already known in the training data of PDFs, books and websites that Anthropic scraped to train Claude on; hence 'documented'. This is why tests like what the OP just did is meaningless.

                        Such "benchmarks" are performative to VCs and they do not ask why isn't the research and testing itself done independently but is almost always done by their own in-house researchers.

                    • bartman

                      today at 10:32 PM

                      Have you by any chance tried this with GPT 4.1 too (also 1M context)?

                      • LanceJones

                        today at 10:34 PM

                        Assuming this experiment involved isolating the LLM from its training set?

                        • zamadatix

                          today at 9:51 PM

                          To be fair, I don't think "Slugulus Eructo" (the name) is actually in the books. This is what's in my copy:

                          > The smug look on Malfoy’s face flickered.

                          > “No one asked your opinion, you filthy little Mudblood,” he spat.

                          > Harry knew at once that Malfoy had said something really bad because there was an instant uproar at his words. Flint had to dive in front of Malfoy to stop Fred and George jumping on him, Alicia shrieked, “How dare you!”, and Ron plunged his hand into his robes, pulled out his wand, yelling, “You’ll pay for that one, Malfoy!” and pointed it furiously under Flint’s arm at Malfoy’s face.

                          > A loud bang echoed around the stadium and a jet of green light shot out of the wrong end of Ron’s wand, hitting him in the stomach and sending him reeling backward onto the grass.

                          > “Ron! Ron! Are you all right?” squealed Hermione.

                          > Ron opened his mouth to speak, but no words came out. Instead he gave an almighty belch and several slugs dribbled out of his mouth onto his lap.

                            • ck_one

                              today at 10:30 PM

                              Then it's fair that id didn't find it

                          • guluarte

                            today at 10:19 PM

                            you can get the same result just asking opus/gpt, it is probably internalized knowledge from reddit or similar sites.

                              • ck_one

                                today at 10:33 PM

                                If you just ask it you don't get the same result. Around 13 spells were missing when I just prompted Opus 4.6 without the books as context.

                            • hbarka

                              today at 10:08 PM

                              If you wanted to fit all 7 books, would you use RAG or another solution?

                              • adarsh2321

                                today at 10:40 PM

                                [dead]

                            • gizmodo59

                              today at 6:14 PM

                              5.3 codex https://openai.com/index/introducing-gpt-5-3-codex/ crushes with a 77.3% in Terminal Bench. The shortest lived lead in less than 35 minutes. What a time to be alive!

                                • wasmainiac

                                  today at 7:10 PM

                                  Dumb question. Can these benchmarks be trusted when the model performance tends to vary depending on the hours and load on OpenAI’s servers? How do I know I’m not getting a severe penalty for chatting at the wrong time. Or even, are the models best after launch then slowly eroded away at to more economical settings after the hype wears off?

                                    • tedsanders

                                      today at 8:09 PM

                                      We don't vary our model quality with time of day or load (beyond negligible non-determinism). It's the same weights all day long with no quantization or other gimmicks. They can get slower under heavy load, though.

                                      (I'm from OpenAI.)

                                        • wasmainiac

                                          today at 10:50 PM

                                          Thanks for the response, I appreciate it. I do notice variation in quality throughout the day. I use it primarily for searching documentation since it’s faster than google in most case, often it is on point, but also it seems off at times, inaccurate or shallow maybe. In some cases I just end the session.

                                          • zamadatix

                                            today at 9:43 PM

                                            I appreciate you taking the time to respond to these kinds of questions the last few days.

                                            • Trufa

                                              today at 8:18 PM

                                              Can you be more specific than this? does it vary in time from launch of a model to the next few months, beyond tinkering and optimization?

                                                • tedsanders

                                                  today at 9:11 PM

                                                  Yeah, happy to be more specific. No intention of making any technically true but misleading statements.

                                                  The following are true:

                                                  - In our API, we don't change model weights or model behavior over time (e.g., by time of day, or weeks/months after release)

                                                  - Tiny caveats include: there is a bit of non-determinism in batched non-associative math that can vary by batch / hardware, bugs or API downtime can obviously change behavior, heavy load can slow down speeds, and this of course doesn't apply to the 'unpinned' models that are clearly supposed to change over time (e.g., xxx-latest). But we don't do any quantization or routing gimmicks that would change model weights.

                                                  - In ChatGPT and Codex CLI, model behavior can change over time (e.g., we might change a tool, update a system prompt, tweak default thinking time, run an A/B test, or ship other updates); we try to be transparent with our changelogs (listed below) but to be honest not every small change gets logged here. But even here we're not doing any gimmicks to cut quality by time of day or intentionally dumb down models after launch. Model behavior can change though, as can the product / prompt / harness.

                                                  ChatGPT release notes: https://help.openai.com/en/articles/6825453-chatgpt-release-...

                                                  Codex changelog: https://developers.openai.com/codex/changelog/

                                                  Codex CLI commit history: https://github.com/openai/codex/commits/main/

                                                    • jychang

                                                      today at 9:30 PM

                                                      What about the juice variable?

                                                      https://www.reddit.com/r/OpenAI/comments/1qv77lq/chatgpt_low...

                                                        • tedsanders

                                                          today at 9:43 PM

                                                          Yep, we recently sped up default thinking times in ChatGPT, as now documented in the release notes: https://help.openai.com/en/articles/6825453-chatgpt-release-...

                                                          The intention was purely making the product experience better, based on common feedback from people (including myself) that wait times were too long. Cost was not a goal here.

                                                          If you still want the higher reliability of longer thinking times, that option is not gone. You can manually select Extended (or Heavy, if you're a Pro user). It's the same as at launch (though we did inadvertently drop it last month and restored it yesterday after Tibor and others pointed it out).

                                                          • tgrowazay

                                                            today at 9:40 PM

                                                            Isn’t that just how many steps at most a reasoning model should do?

                                                        • ComplexSystems

                                                          today at 9:22 PM

                                                          Do you ever replace ChatGPT models with cheaper, distilled, quantized, etc ones to save cost?

                                                            • jghn

                                                              today at 9:28 PM

                                                              He literally said no to this in his GP post

                                                      • joshvm

                                                        today at 8:58 PM

                                                        My gut feeling is that performance is more heavily affected by harnesses which get updated frequently. This would explain why people feel that Claude is sometimes more stupid - that's actually accurate phrasing, because Sonnet is probably unchanged. Unless Anthropic also makes small A/B adjustments to weights and technically claims they don't do dynamic degradation/quantization based on load. Either way, both affect the quality of your responses.

                                                        It's worth checking different versions of Claude Code, and updating your tools if you don't do it automatically. Also run the same prompts through VS Code, Cursor, Claude Code in terminal, etc. You can get very different model responses based on the system prompt, what context is passed via the harness, how the rules are loaded and all sorts of minor tweaks.

                                                        If you make raw API calls and see behavioural changes over time, that would be another concern.

                                                    • Someone1234

                                                      today at 8:29 PM

                                                      Specifically including routing (i.e. which model you route to based on load/ToD)?

                                                      PS - I appreciate you coming here and commenting!

                                                        • hhh

                                                          today at 8:32 PM

                                                          There is no routing with API, or when you choose a specific model in chatGPT.

                                                            • today at 9:28 PM

                                                  • Corence

                                                    today at 7:31 PM

                                                    It is a fair question. I'd expect the numbers are all real. Competitors are going to rerun the benchmark with these models to see how the model is responding and succeeding on the tasks and use that information to figure out how to improve their own models. If the benchmark numbers aren't real their competitors will call out that it's not reproducible.

                                                    However it's possible that consumers without a sufficiently tiered plan aren't getting optimal performance, or that the benchmark is overfit and the results won't generalize well to the real tasks you're trying to do.

                                                      • mrandish

                                                        today at 10:22 PM

                                                        > I'd expect the numbers are all real.

                                                        I think a lot of people are concerned due to 1) significant variance in performance being reported by a large number of users, and 2) We have specific examples of OpenAI and other labs benchmaxxing in the recent past (https://grok.com/share/c2hhcmQtMw_66c34055-740f-43a3-a63c-4b...).

                                                        It's tricky because there are so many subtle ways in which "the numbers are all real" could be technically true in some sense, yet still not reflect what a customer will experience (eg harnesses, etc). And any of those ways can benefit the cost structures of companies currently subsidizing models well below their actual costs with limited investor capital. All with billions of dollars in potential personal wealth at stake for company employees and dozens of hidden cost/performance levers at their disposal.

                                                        And it doesn't even require overt deception on anyone's part. For example, the teams doing benchmark testing of unreleased new models aren't the same people as the ops teams managing global deployment/load balancing at scale day-to-day. If there aren't significant ongoing resources devoted to specifically validating those two things remain in sync - they'll almost certainly drift apart. And it won't be anyone's job to even know it's happening until a meaningful number of important customers complain or sales start to fall. Of course, if an unplanned deviation causes costs to rise over budget, it's a high-priority bug to be addressed. But if the deviation goes the other way and costs are little lower than expected, no one's getting a late night incident alert. This isn't even a dig at OpenAI in particular, it's just the default state of how large orgs work.

                                                    • ifwinterco

                                                      today at 7:54 PM

                                                      On benchmarks GPT 5.2 was roughly equivalent to Opus 4.5 but most people who've used both for SWE stuff would say that Opus 4.5 is/was noticeably better

                                                        • CraigJPerry

                                                          today at 8:57 PM

                                                          There's an extended thinking mode for GPT 5.2 i forget the name of it right at this minute. It's super slow - a 3 minute opus 4.5 prompt is circa 12 minutes to complete in 5.2 on that super extended thinking mode but it is not a close race in terms of results - GPT 5.2 wins by a handy margin in that mode. It's just too slow to be useable interactively though.

                                                            • ifwinterco

                                                              today at 9:55 PM

                                                              Interesting, sounds like I definitely need to give the GPT models another proper go based on this discussion

                                                          • georgeven

                                                            today at 8:04 PM

                                                            Interesting. Everyone in my circle said the opposite.

                                                              • krzyk

                                                                today at 8:53 PM

                                                                It probably depends on programming language and expectations.

                                                                  • ifwinterco

                                                                    today at 9:51 PM

                                                                    This is mostly Python/TS for me... what Jonathan Blow would probably call not "real programming" but it pays the bills

                                                                    They can both write fairly good idiomatic code but in my experience opus 4.5 is better at understanding overall project structure etc. without prompting. It just does things correctly first time more often than codex. I still don't trust it obviously but out of all LLMs it's the closest to actually starting to earn my trust

                                                            • SatvikBeri

                                                              today at 10:05 PM

                                                              I pretty consistently heard people say Codex was much slower but produced better results, making it better for long-running work in the background, and worse for more interactive development.

                                                              • elAhmo

                                                                today at 8:00 PM

                                                                I mostly used Sonnet/Opus 4.x in the past months, but 5.2 Codex seemed to be on par or better for my use case in the past month. I tried a few models here and there but always went back to Claude, but with 5.2 Codex for the first time I felt it was very competitive, if not better.

                                                                Curious to see how things will be with 5.3 and 4.6

                                                            • smcleod

                                                              today at 9:37 PM

                                                              I don't think much from OpenAI can be trusted tbh.

                                                              • aaaalone

                                                                today at 7:19 PM

                                                                At the end of the day you test it for your use cases anyway but it makes it a great initial hint if it's worth it to test out.

                                                                • cyanydeez

                                                                  today at 7:36 PM

                                                                  When do you think we should run this benchmark? Friday, 1pm? Monday 8AM? Wednesday 11AM?

                                                                  I definitely suspect all these models are being degraded during heavy loads.

                                                                    • j_maffe

                                                                      today at 7:49 PM

                                                                      This hypothesis is tested regularly by plenty of live benchmarks. The services usually don't decay in performance.

                                                                  • thinkingtoilet

                                                                    today at 9:22 PM

                                                                    We know Open AI got caught getting benchmark data and tuning their models to it already. So the answer is a hard no. I imagine over time it gives a general view of the landscape and improvements, but take it with a large grain of salt.

                                                                      • rvz

                                                                        today at 10:37 PM

                                                                        The same thing was done with Meta researchers with Llama 4 and what can go wrong when 'independent' researchers begin to game AI benchmarks. [0]

                                                                        You always have to question these benchmarks, especially when the in-house researchers can potentially game them if they wanted to.

                                                                        Which is why it must be independent.

                                                                        [0] https://gizmodo.com/meta-cheated-on-ai-benchmarks-and-its-a-...

                                                                • purplerabbit

                                                                  today at 6:29 PM

                                                                  The lack of broad benchmark reports in this makes me curious: Has OpenAI reverted to benchmaxxing? Looking forward to hearing opinions once we all try both of these out

                                                                    • MallocVoidstar

                                                                      today at 7:24 PM

                                                                      The -codex models are only for 'agentic coding', nothing else.

                                                                        • dingnuts

                                                                          today at 7:33 PM

                                                                          [dead]

                                                                  • nharada

                                                                    today at 6:23 PM

                                                                    That's a massive jump, I'm curious if there's a materially different feeling in how it works or if we're starting to reach the point of benchmark saturation. If the benchmark is good then 10 points should be a big improvement in capability...

                                                                    • jkelleyrtp

                                                                      today at 6:27 PM

                                                                      claude swe-bench is 80.8 and codex is 56.8

                                                                      Seems like 4.6 is still all-around better?

                                                                        • gizmodo59

                                                                          today at 6:28 PM

                                                                          Its SWE bench pro not swe bench verified. The verified benchmark has stagnated

                                                                            • joshuahedlund

                                                                              today at 6:30 PM

                                                                              Any ideas why verified has stagnated? It was increasing rapidly and then basically stopped.

                                                                                • Snuggly73

                                                                                  today at 6:53 PM

                                                                                  it has been pretty much a benchmark for memorization for a while. there is a paper on the subject somewhere.

                                                                                  swe bench pro public is newer, but its not live, so it will get slowly memorized as well. the private dataset is more interesting, as are the results there:

                                                                                  https://scale.com/leaderboard/swe_bench_pro_private

                                                                          • Rudybega

                                                                            today at 9:59 PM

                                                                            You're comparing two different benchmarks. Pro vs Verified.

                                                                    • pjot

                                                                      today at 6:03 PM

                                                                      Claude Code release notes:

                                                                        > Version 2.1.32:
                                                                           • Claude Opus 4.6 is now available!
                                                                           • Added research preview agent teams feature for multi-agent collaboration (token-intensive feature, requires setting
                                                                           CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1)
                                                                           • Claude now automatically records and recalls memories as it works
                                                                           • Added "Summarize from here" to the message selector, allowing partial conversation summarization.
                                                                           • Skills defined in .claude/skills/ within additional directories (--add-dir) are now loaded automatically.
                                                                           • Fixed @ file completion showing incorrect relative paths when running from a subdirectory
                                                                           • Updated --resume to re-use --agent value specified in previous conversation by default.
                                                                           • Fixed: Bash tool no longer throws "Bad substitution" errors when heredocs contain JavaScript template literals like ${index + 1}, which
                                                                           previously interrupted tool execution
                                                                           • Skill character budget now scales with context window (2% of context), so users with larger context windows can see more skill descriptions
                                                                           without truncation
                                                                           • Fixed Thai/Lao spacing vowels (สระ า, ำ) not rendering correctly in the input field
                                                                           • VSCode: Fixed slash commands incorrectly being executed when pressing Enter with preceding text in the input field
                                                                           • VSCode: Added spinner when loading past conversations list

                                                                        • neuronexmachina

                                                                          today at 6:08 PM

                                                                          > Claude now automatically records and recalls memories as it works

                                                                          Neat: https://code.claude.com/docs/en/memory

                                                                          I guess it's kind of like Google Antigravity's "Knowledge" artifacts?

                                                                            • bityard

                                                                              today at 7:46 PM

                                                                              If it works anything like the memories on Copilot (which have been around for quite a while), you need to be pretty explicit about it being a permanent preference for it to be stored as a memory. For example, "Don't use emoji in your response" would only be relevant for the current chat session, whereas this is more sticky: "I never want to see emojis from you, you sub-par excuse for a roided-out spreadsheet"

                                                                                • flutas

                                                                                  today at 9:10 PM

                                                                                  It's a lot more iffy than that IME.

                                                                                  It's very happy to throw a lot into the memory, even if it doesn't make sense.

                                                                                  • 9dev

                                                                                    today at 8:56 PM

                                                                                    > you sub-par excuse for a roided-out spreadsheet

                                                                                    That’s harsh, man.

                                                                                • 4b11b4

                                                                                  today at 9:23 PM

                                                                                  I understand everyone's trying to solve this problem but I'm envisioning 1 year down the line when your memory is full of stuff that shouldn't be in there.

                                                                                  • om8

                                                                                    today at 6:23 PM

                                                                                    Is there a way to disable it? Sometimes I value agent not having knowledge that it needs to cut corners

                                                                                      • nerdsniper

                                                                                        today at 7:19 PM

                                                                                        90-98% of the time I want the LLM to only have the knowledge I gave it in the prompt. I'm actually kind of scared that I'll wake up one day and the web interface for ChatGPT/Opus/Gemini will pull information from my prior chats.

                                                                                          • pdntspa

                                                                                            today at 9:27 PM

                                                                                            They already do this

                                                                                            I've had claude reference prior conversations when I'm trying to get technical help on thing A, and it will ask me if this conversation is because of thing B that we talked about in the immediate past

                                                                                            • vineyardmike

                                                                                              today at 9:18 PM

                                                                                              All these of these providers support this feature. I don’t know about ChatGPT but the rest are opt-in. I imagine with Gemini it’ll be default on soon enough, since it’s consumer focused. Claude does constantly nag me to enable it though.

                                                                                              • hypercube33

                                                                                                today at 7:48 PM

                                                                                                I'm fairly sure OpenAI/GPT does pull prior information in the form of its memories

                                                                                                  • nerdsniper

                                                                                                    today at 7:50 PM

                                                                                                    Ah, that could explain why I've found myself using it the least.

                                                                                                • sharifhsn

                                                                                                  today at 7:53 PM

                                                                                                  Gemini has this feature but it’s opt-in.

                                                                                              • kzahel

                                                                                                today at 8:17 PM

                                                                                                Claude told me he can disable it by putting instructions in the MEMORY.md file to not use it. So only a soft disable AFAIK and you'd need to do it on each machine.

                                                                                            • pdntspa

                                                                                              today at 9:26 PM

                                                                                              I thought it was already doing this?

                                                                                              I asked Claude UI to clear its memory a little while back and hoo boy CC got really stupid for a couple of days

                                                                                              • codethief

                                                                                                today at 6:26 PM

                                                                                                Are we sure the docs page has been updated yet? Because that page doesn't say anything about automatic recording of memories.

                                                                                                  • neuronexmachina

                                                                                                    today at 7:39 PM

                                                                                                    Oh, quite right. I saw people mention MEMORY.md online and I assumed that was the doc for it, but it looks like it isn't.

                                                                                                • kzahel

                                                                                                  today at 8:20 PM

                                                                                                  I looked into it a bit. It stores memories near where it stores JSONL session history. It's per-project (and specific to the machine) Claude pretty aggressively and frequently writes stuff in there. It uses MEMORY.md as sort of the index, and will write out other files with other topics (linking to them from the main MEMORY.md) file.

                                                                                                  It gives you a convenient way to say "remember this bug for me, we should fix tomorrow". I'll be playing around with it more for sure.

                                                                                                  I asked Claude to give me a TLDR (condensed from its system prompt):

                                                                                                  ----

                                                                                                  Persistent directory at ~/.claude/projects/{project-path}/memory/, persists across conversations

                                                                                                  MEMORY.md is always injected into the system prompt; truncated after 200 lines, so keep it concise

                                                                                                  Separate topic files for detailed notes, linked from MEMORY.md What to record: problem constraints, strategies that worked/failed, lessons learned

                                                                                                  Proactive: when I hit a common mistake, check memory first - if nothing there, write it down

                                                                                                  Maintenance: update or remove memories that are wrong or outdated

                                                                                                  Organization: by topic, not chronologically

                                                                                                  Tools: use Write/Edit to update (so you always see the tool calls)

                                                                                                    • ra7

                                                                                                      today at 10:18 PM

                                                                                                      > Persistent directory at ~/.claude/projects/{project-path}/memory/, persists across conversations

                                                                                                      I create a git worktree, start Claude Code in that tree, and delete after. I notice each worktree gets a memory directory in this location. So is memory fragmented and not combined for the "main" repo?

                                                                                          • simonw

                                                                                            today at 5:58 PM

                                                                                            The bicycle frame is a bit wonky but the pelican itself is great: https://gist.github.com/simonw/a6806ce41b4c721e240a4548ecdbe...

                                                                                              • stkai

                                                                                                today at 6:56 PM

                                                                                                Would love to find out they're overfitting for pelican drawings.

                                                                                                  • andy_ppp

                                                                                                    today at 7:16 PM

                                                                                                    Yes, Racoon on a unicycle? Magpie on a pedalo?

                                                                                                  • theanonymousone

                                                                                                    today at 9:44 PM

                                                                                                    Even if not intentionally, it is probably leaking into training sets.

                                                                                                    • fragmede

                                                                                                      today at 7:54 PM

                                                                                                      The estimation I did 4 months ago:

                                                                                                      > there are approximately 200k common nouns in English, and then we square that, we get 40 billion combinations. At one second per, that's ~1200 years, but then if we parallelize it on a supercomputer that can do 100,000 per second that would only take 3 days. Given that ChatGPT was trained on all of the Internet and every book written, I'm not sure that still seems infeasible.

                                                                                                      https://news.ycombinator.com/item?id=45455786

                                                                                                        • eli

                                                                                                          today at 8:07 PM

                                                                                                          How would you generate a picture of Noun + Noun in the first place in order to train the LLM with what it would look like? What's happening during that 1 estimated second?

                                                                                                            • metalliqaz

                                                                                                              today at 9:51 PM

                                                                                                              its pelicans all the way down

                                                                                                              • Terretta

                                                                                                                today at 8:39 PM

                                                                                                                This is why everyone trains their LLM on another LLM. It's all about the pelicans.

                                                                                                            • AnimalMuppet

                                                                                                              today at 9:28 PM

                                                                                                              But you need to also include the number of prepositions. "A pelican on a bicycle" is not at all the same as "a pelican inside a bicycle".

                                                                                                              There are estimated to be 100 or so prepositions in English. That gets you to 4 trillion combinations.

                                                                                                      • gcanyon

                                                                                                        today at 6:44 PM

                                                                                                        One aspect of this is that apparently most people can't draw a bicycle much better than this: they get the elements of the frame wrong, mess up the geometry, etc.

                                                                                                          • arionmiles

                                                                                                            today at 8:08 PM

                                                                                                            There's a research paper from the University of Liverpool, published in 2006 where researchers asked people to draw bicycles from memory and how people overestimate their understanding of basic things. It was a very fun and short read.

                                                                                                            It's called "The science of cycology: Failures to understand how everyday objects work" by Rebecca Lawson.

                                                                                                            https://link.springer.com/content/pdf/10.3758/bf03195929.pdf

                                                                                                              • devilcius

                                                                                                                today at 10:02 PM

                                                                                                                There’s also a great art/design project about exactly this. Gianluca Gimini asked hundreds of people to draw a bicycle from memory, and most of them got the frame, proportions, or mechanics wrong. https://www.gianlucagimini.it/portfolio-item/velocipedia/

                                                                                                                • rcxdude

                                                                                                                  today at 8:32 PM

                                                                                                                  A place I worked at used it as part of an interview question (it wasn't some pass/fail thing to get it 100% correct, and was partly a jumping off point to a different question). This was in a city where nearly everyone uses bicycles as everyday transportation. It was surprising how many supposedly mechanical-focused people who rode a bike everyday, even rode a bike to the interview, would draw a bike that would not work.

                                                                                                                    • gcanyon

                                                                                                                      today at 9:54 PM

                                                                                                                      I wish I had interviewed there. When I first read that people have a hard time with this I immediately sat down without looking at a reference and drew a bicycle. I could ace your interview.

                                                                                                                      • throwuxiytayq

                                                                                                                        today at 9:15 PM

                                                                                                                        This is why at my company in interviews we ask people to draw a CPU diagram. You'd be surprised how many supposedly-senior computer programmers would draw a processor that would not work.

                                                                                                                          • niobe

                                                                                                                            today at 9:25 PM

                                                                                                                            If I was asked that question in an interview to be a programmer I'd walk out. How many abstraction layers either side of your knowledge domain do you need to be an expert in? Further, being a good technologist of any kind is not about having arcane details at the tip of your frontal lobe, and a company worth working for would know that.

                                                                                                                            • gedy

                                                                                                                              today at 9:26 PM

                                                                                                                              That's reasonable in many cases, but I've had situations like this for senior UI and frontend positions, and they: don't ask UI or frontend questions. And ask their pet low level questions. Some even snort that it's softball to ask UI questions or "they use whatever". It's like, yeah no wonder your UI is shit and now you are hiring to clean it up.

                                                                                                                              • rsc

                                                                                                                                today at 9:46 PM

                                                                                                                                Raises hand.

                                                                                                                    • gnatolf

                                                                                                                      today at 7:03 PM

                                                                                                                      Absolutely. A technically correct bike is very hard to draw in SVG without going overboard in details

                                                                                                                    • nateglims

                                                                                                                      today at 8:33 PM

                                                                                                                      I just had an idea for an RLVR startup.

                                                                                                                      • cyanydeez

                                                                                                                        today at 7:32 PM

                                                                                                                        Yes, but obviously AGI will solve this by, _checks notes_ more TerraWatts!

                                                                                                                          • hackernudes

                                                                                                                            today at 7:43 PM

                                                                                                                            The word is terawatts unless you mean earth-based watts. OK then, it's confirmed, data centers in space!

                                                                                                                            • seanhunter

                                                                                                                              today at 7:41 PM

                                                                                                                              …in space!

                                                                                                                      • franze

                                                                                                                        today at 8:43 PM

                                                                                                                        here the animated version https://claude.ai/public/artifacts/3db12520-eaea-4769-82be-7...

                                                                                                                          • gryfft

                                                                                                                            today at 8:48 PM

                                                                                                                            That's hilarious. It's so close!

                                                                                                                        • etwigg

                                                                                                                          today at 10:00 PM

                                                                                                                          If we do get paperclipped, I hope it is of the "cycling pelican" variety. Thanks for your important contribution to alignment Simon!

                                                                                                                          • einrealist

                                                                                                                            today at 6:10 PM

                                                                                                                            They trained for it. That's the +0.1!

                                                                                                                            • zahlman

                                                                                                                              today at 9:10 PM

                                                                                                                              Do you find that word choices like "generate" (as opposed to "create", "author", "write" etc.) influence the model's success?

                                                                                                                              Also, is it bad that I almost immediately noticed that both of the pelican's legs are on the same side of the bicycle, but I had to look up an image on Wikipedia to confirm that they shouldn't have long necks?

                                                                                                                              Also, have you tried iterating prompts on this test to see if you can get more realistic results? (How much does it help to make them look up reference images first?)

                                                                                                                              • beemboy

                                                                                                                                today at 8:57 PM

                                                                                                                                Isn't there a point at which it trains itself on these various outputs, or someone somewhere draws one and feeds it into the model so as to pass this benchmark?

                                                                                                                                • athrowaway3z

                                                                                                                                  today at 6:15 PM

                                                                                                                                  This benchmark inspired me to have codex/claude build a DnD battlemap tool with svg's.

                                                                                                                                  They got surprisingly far, but i did need to iterate a few times to have it build tools that would check for things like; dont put walls on roads or water.

                                                                                                                                  What I think might be the next obstacle is self-knowledge. The new agents seem to have picked up ever more vocabulary about their context and compaction, etc.

                                                                                                                                  As a next benchmark you could try having 1 agent and tell it to use a coding agent (via tmux) to build you a pelican.

                                                                                                                                  • eaf7e281

                                                                                                                                    today at 6:11 PM

                                                                                                                                    There's no way they actually work on training this.

                                                                                                                                      • margalabargala

                                                                                                                                        today at 6:29 PM

                                                                                                                                        I suspect they're training on this.

                                                                                                                                        I asked Opus 4.6 for a pelican riding a recumbent bicycle and got this.

                                                                                                                                        https://i.imgur.com/UvlEBs8.png

                                                                                                                                          • WarmWash

                                                                                                                                            today at 6:54 PM

                                                                                                                                            It would be way way better if they were benchmaxxing this. The pelican in the image (both images) has arms. Pelicans don't have arms, and a pelican riding a bike would use it's wings.

                                                                                                                                              • ryandrake

                                                                                                                                                today at 7:43 PM

                                                                                                                                                Having briefly worked in the 3D Graphics industry, I don't even remotely trust benchmarks anymore. The minute someone's benchmark performance becomes a part of the public's purchasing decision, companies will pull out every trick in the book--clean or dirty--to benchmaxx their product. Sometimes at the expense of actual real-world performance.

                                                                                                                                                • seanhunter

                                                                                                                                                  today at 7:42 PM

                                                                                                                                                  Pelicans don’t ride bikes. You can’t have scruples about whether or not the image of a pelican riding a bike has arms.

                                                                                                                                                    • jevinskie

                                                                                                                                                      today at 7:48 PM

                                                                                                                                                      Wouldn’t any decent bike-riding pelican have a bike tailored to pelicans and their wings?

                                                                                                                                                        • actsasbuffoon

                                                                                                                                                          today at 9:23 PM

                                                                                                                                                          Sure, that’s one solution. You could also Isle of Dr Moreau your way to a pelican that can use a regular bike. The sky is the limit when you have no scruples.

                                                                                                                                                          • cinntaile

                                                                                                                                                            today at 8:03 PM

                                                                                                                                                            Now that would be a smart chat agent.

                                                                                                                                                • mrandish

                                                                                                                                                  today at 6:39 PM

                                                                                                                                                  Interesting that it seems better. Maybe something about adding a highly specific yet unusual qualifier focusing attention?

                                                                                                                                                  • riffraff

                                                                                                                                                    today at 7:53 PM

                                                                                                                                                    perhaps try a penny farthing?

                                                                                                                                                • KeplerBoy

                                                                                                                                                  today at 6:15 PM

                                                                                                                                                  There is no way they are not training on this.

                                                                                                                                                    • today at 6:29 PM

                                                                                                                                                      • collinmanderson

                                                                                                                                                        today at 6:16 PM

                                                                                                                                                        I suspect they have generic SVG drawing that they focus on.

                                                                                                                                                    • fragmede

                                                                                                                                                      today at 7:49 PM

                                                                                                                                                      The people that work at Anthropic are aware of simonw and his test, and people aren't unthinking data-driven machines. How valid his test is or isn't, a better score on it is convincing. If it gets, say, 1,000 people to use Claude Code over Codex, how much would that be worth to Anthropic?

                                                                                                                                                      $200 * 1,000 = $200k/month.

                                                                                                                                                      I'm not saying they are, but to say that they aren't with such certainty, when money is on the line; unless you have some insider knowledge you'd like to share with the rest of the class, it seems like an questionable conclusion.

                                                                                                                                                  • bityard

                                                                                                                                                    today at 7:37 PM

                                                                                                                                                    Well, the clouds are upside-down, so I don't think I can give it a pass.

                                                                                                                                                    • hoeoek

                                                                                                                                                      today at 6:07 PM

                                                                                                                                                      This really is my favorite benchmark

                                                                                                                                                      • nine_k

                                                                                                                                                        today at 7:43 PM

                                                                                                                                                        I suppose the pelican must be now specifically trained for, since it's a well-known benchmark.

                                                                                                                                                        • copilot_king_2

                                                                                                                                                          today at 6:17 PM

                                                                                                                                                          I'm firing all of my developers this afternoon.

                                                                                                                                                            • RGamma

                                                                                                                                                              today at 7:11 PM

                                                                                                                                                              Opus 6 will fire you instead for being too slow with the ideas.

                                                                                                                                                              • insane_dreamer

                                                                                                                                                                today at 8:58 PM

                                                                                                                                                                Too late. You’ve already been fired by a moltbot agent from your PHB.

                                                                                                                                                            • 7777777phil

                                                                                                                                                              today at 6:13 PM

                                                                                                                                                              best pelican so far would you say? Or where does it rank in the pelican benchmark?

                                                                                                                                                                • mrandish

                                                                                                                                                                  today at 6:21 PM

                                                                                                                                                                  In other words, is it a pelican or a pelican't?

                                                                                                                                                                    • canadiantim

                                                                                                                                                                      today at 9:07 PM

                                                                                                                                                                      You’ve been sitting on that pun just waiting for it to take flight

                                                                                                                                                              • nubg

                                                                                                                                                                today at 6:04 PM

                                                                                                                                                                What about the Pelo2 benchmark? (the gray bird that is not gray)

                                                                                                                                                                • 6thbit

                                                                                                                                                                  today at 8:23 PM

                                                                                                                                                                  do you have a gif? i need an evolving pelican gif

                                                                                                                                                                  • risyachka

                                                                                                                                                                    today at 8:06 PM

                                                                                                                                                                    Pretty sure at this point they train it on pelicans

                                                                                                                                                                    • ares623

                                                                                                                                                                      today at 6:02 PM

                                                                                                                                                                      Can it draw a different bird on a bike?

                                                                                                                                                                    • DetroitThrow

                                                                                                                                                                      today at 6:01 PM

                                                                                                                                                                      The ears on top are a cute touch

                                                                                                                                                                      • iujasdkjfasf

                                                                                                                                                                        today at 9:20 PM

                                                                                                                                                                        [dead]

                                                                                                                                                                        • behnamoh

                                                                                                                                                                          today at 6:35 PM

                                                                                                                                                                          [flagged]

                                                                                                                                                                            • smokel

                                                                                                                                                                              today at 7:25 PM

                                                                                                                                                                              I'll bite. The benchmark is actually pretty good. It shows in an extremely comprehensible way how far LLMs have come. Someone not in the know has a hard time understanding what 65.4% means on "Terminal-Bench 2.0". Comparing some crappy pelicans on bicycles is a lot easier.

                                                                                                                                                                                • blibble

                                                                                                                                                                                  today at 8:50 PM

                                                                                                                                                                                  it ceases to be a useful benchmark of general ability when you post it publicly for them to train against

                                                                                                                                                                              • quinnjh

                                                                                                                                                                                today at 7:26 PM

                                                                                                                                                                                the field is advancing so fast it's hard to do real science as their will be a new SOTA by the time you're ready to publish results. i think this is a combination of that and people having a laugh.

                                                                                                                                                                                Would you mind sharing which benchmarks you think are useful measures for multimodal reasoning?

                                                                                                                                                                                  • techpression

                                                                                                                                                                                    today at 8:50 PM

                                                                                                                                                                                    A benchmark only tests what the benchmark is doing, the goal is to make that task correlate with actually valuable things. Graphic benchmarks is a good example, extremely hard to know what you will get in a game by looking at 3D Mark scores, it varies by a lot. Making a SVG of a single thing doesn’t help much unless that applies to all SVG tasks.

                                                                                                                                                                            • fullstackchris

                                                                                                                                                                              today at 9:21 PM

                                                                                                                                                                              [flagged]

                                                                                                                                                                                • dang

                                                                                                                                                                                  today at 10:18 PM

                                                                                                                                                                                  Personal attacks are not allowed on HN. No more of this, please.

                                                                                                                                                                          • surajkumar5050

                                                                                                                                                                            today at 8:31 PM

                                                                                                                                                                            I think two things are getting conflated in this discussion.

                                                                                                                                                                            First: marginal inference cost vs total business profitability. It’s very plausible (and increasingly likely) that OpenAI/Anthropic are profitable on a per-token marginal basis, especially given how cheap equivalent open-weight inference has become. Third-party providers are effectively price-discovering the floor for inference.

                                                                                                                                                                            Second: model lifecycle economics. Training costs are lumpy, front-loaded, and hard to amortize cleanly. Even if inference margins are positive today, the question is whether those margins are sufficient to pay off the training run before the model is obsoleted by the next release. That’s a very different problem than “are they losing money per request”.

                                                                                                                                                                            Both sides here can be right at the same time: inference can be profitable, while the overall model program is still underwater. Benchmarks and pricing debates don’t really settle that, because they ignore cadence and depreciation.

                                                                                                                                                                            IMO the interesting question isn’t “are they subsidizing inference?” but “how long does a frontier model need to stay competitive for the economics to close?”

                                                                                                                                                                              • raincole

                                                                                                                                                                                today at 9:02 PM

                                                                                                                                                                                > the interesting question isn’t “are they subsidizing inference?”

                                                                                                                                                                                The interesting question is if they are subsidizing the $200/mo plan. That's what is supporting the whole vibecoding/agentic coding thing atm. I don't believe Claude Code would have taken off if it were token-by-token from day 1.

                                                                                                                                                                                (My baseless bet is that they're, but not by much and the price will eventually rise by perhaps 2x but not 10x.)

                                                                                                                                                                                • jmalicki

                                                                                                                                                                                  today at 8:45 PM

                                                                                                                                                                                  I suspect they're marginally profitable on API cost plans.

                                                                                                                                                                                  But the max 20x usage plans I am more skeptical of. When we're getting used to $200 or $400 costs per developer to do aggressive AI-assisted coding, what happens when those costs go up 20x? what is now $5k/yr to keep a Codex and a Claude super busy and do efficient engineering suddenly becomes $100k/yr... will the costs come down before then? Is the current "vibe-coding renaissance" sustainable in that regime?

                                                                                                                                                                                    • slopusila

                                                                                                                                                                                      today at 10:33 PM

                                                                                                                                                                                      after the models get good enough to replace coders they will be able to start increasing the subscriptions back up

                                                                                                                                                                                  • BosunoB

                                                                                                                                                                                    today at 8:52 PM

                                                                                                                                                                                    Dario said this in a podcast somewhere. The models themselves have so far been profitable if you look at their lifetime costs and revenue. Annual profitability just isn't a very good lens for AI companies because costs all land in one year and the revenue all comes in the next. Prolific AI haters like Ed Zitron make this mistake all the time.

                                                                                                                                                                                      • jmalicki

                                                                                                                                                                                        today at 9:06 PM

                                                                                                                                                                                        Do you have a specific reference? I'm curious to see hard data and models.... I think this makes sense, but I haven't figured out how to see the numbers or think about it.

                                                                                                                                                                                          • BosunoB

                                                                                                                                                                                            today at 9:44 PM

                                                                                                                                                                                            I was able to find the podcast. Question is at 33:30. He doesn't give hard data but he explains his reasoning.

                                                                                                                                                                                            https://youtu.be/mYDSSRS-B5U

                                                                                                                                                                                        • today at 9:39 PM

                                                                                                                                                                                          • today at 9:36 PM

                                                                                                                                                                                            • jmatthiass

                                                                                                                                                                                              today at 10:06 PM

                                                                                                                                                                                              In his recent appearance on NYT Dealbook, he definitely made it seem like inference was sustainable, if not flat-out profitable.

                                                                                                                                                                                              https://www.youtube.com/live/FEj7wAjwQIk

                                                                                                                                                                                          • rstuart4133

                                                                                                                                                                                            today at 9:13 PM

                                                                                                                                                                                            > It’s very plausible (and increasingly likely) that OpenAI/Anthropic are profitable on a per-token marginal basis

                                                                                                                                                                                            There any many places that will not use models running on hardware provided by OpenAI / Anthropic. That is the case true of my (the Australian) government at all levels. They will only use models running in Australia.

                                                                                                                                                                                            Consequently AWS (and I presume others) will run models supplied by the AI companies for you in their data centres. They won't be doing that at a loss, so the price will cover marginal cost of the compute plus renting the model. I know from devs using and deploying the service demand outstrips supply. Ergo, I don't think there is much doubt that they are making money from inference.

                                                                                                                                                                                            • w10-1

                                                                                                                                                                                              today at 9:17 PM

                                                                                                                                                                                              "how long does a frontier model need to stay competitive"

                                                                                                                                                                                              Remember "worse is better". The model doesn't have to be the best; it just has to be mostly good enough, and used by everyone -- i.e., where switching costs would be higher than any increase in quality. Enterprises would still be on Java if the operating costs of native containers weren't so much cheaper.

                                                                                                                                                                                              So it can make sense to be ok with losing money with each training generation initially, particularly when they are being driven by specific use-cases (like coding). To the extent they are specific, there will be more switching costs.

                                                                                                                                                                                          • jonatron

                                                                                                                                                                                            today at 10:50 PM

                                                                                                                                                                                            Can someone ask: "what is the current carrying capacity of 25mm multicore armoured thermoplastic insulated cables with aluminium conductors, on perforated cable tray?" just to see how well it can look up information in BS 7671?

                                                                                                                                                                                              • jaggederest

                                                                                                                                                                                                today at 10:52 PM

                                                                                                                                                                                                > what is the current carrying capacity of 25mm multicore armoured thermoplastic insulated cables with aluminium conductors, on perforated cable tray?

                                                                                                                                                                                                  This is an electrical engineering question rather than a software engineering one, but let me look it up for you.
                                                                                                                                                                                                
                                                                                                                                                                                                   WEB SEARCH ("current carrying capacity 25mm multicore armoured thermoplastic insulated cable aluminium conductors perforated cable tray BS 7671")
                                                                                                                                                                                                   ↳ Found 10 results.
                                                                                                                                                                                                
                                                                                                                                                                                                  Based on BS 7671 Table 4D4B (multicore armoured thermoplastic insulated cables with aluminium conductors), the current-carrying capacity for a 25mm cable on a perforated cable tray (Reference Method E) is approximately 80A for a 3- or 4-core cable at 30C ambient temperature and 70C conductor operating temperature.
                                                                                                                                                                                                
                                                                                                                                                                                                   Note: The exact value depends on the number of cores (2-core vs 3/4-core) and whether any rating/correction factors (grouping, ambient temperature, thermal insulation) apply. I'd recommend confirming against the actual BS 7671:2018 Table 4D4B for your specific configuration, as the aluminium conductor ratings are lower than the copper equivalents (~102A for copper in the same conditions).
                                                                                                                                                                                                
                                                                                                                                                                                                   This is outside my core domain of software engineering -- for critical electrical installations, always verify with the relevant standards documentation or a qualified electrical engineer.

                                                                                                                                                                                            • legitster

                                                                                                                                                                                              today at 6:05 PM

                                                                                                                                                                                              I'm still not sure I understand Anthropic's general strategy right now.

                                                                                                                                                                                              They are doing these broad marketing programs trying to take on ChatGPT for "normies". And yet their bread and butter is still clearly coding.

                                                                                                                                                                                              Meanwhile, Claude's general use cases are... fine. For generic research topics, I find that ChatGPT and Gemini run circles around it: in the depth of research, the type of tasks it can handle, and the quality and presentation of the responses.

                                                                                                                                                                                              Anthropic is also doing all of these goofy things to try to establish the "humanity" of their chatbot - giving it rights and a constitution and all that. Yet it weirdly feels the most transactional out of all of them.

                                                                                                                                                                                              Don't get me wrong, I'm a paying Claude customer and love what it's good at. I just think there's a disconnect between what Claude is and what their marketing department thinks it is.

                                                                                                                                                                                                • tgtweak

                                                                                                                                                                                                  today at 6:09 PM

                                                                                                                                                                                                  Claude itself (outside of code workflows) actually works very well for general purpose chat. I have a few non-technical friends that have moved over from chatgpt after some side-by-side testing and I've yet to see one go back - which is good since claude circa 8 months ago was borderline unusable for anything but coding on the api.

                                                                                                                                                                                                  • eaf7e281

                                                                                                                                                                                                    today at 6:16 PM

                                                                                                                                                                                                    I kinda agree. Their model just doesn't feel "daily" enough. I would use it for any "agentic" tasks and for using tools, but definitely not for day to day questions.

                                                                                                                                                                                                      • lukebechtel

                                                                                                                                                                                                        today at 6:22 PM

                                                                                                                                                                                                        Why? I use it for all and love it.

                                                                                                                                                                                                        That doesn't mean you have to, but I'm curious why you think it's behind in the personal assistant game.

                                                                                                                                                                                                          • legitster

                                                                                                                                                                                                            today at 6:41 PM

                                                                                                                                                                                                            I have three specific use cases where I try both but ChatGPT wins:

                                                                                                                                                                                                            - Recipes and cooking: ChatGPT just has way more detailed and practical advice. It also thinks outside of the box much more, whereas Claude gets stuck in a rut and sticks very closely to your prompt. And ChatGPT's easier to understand/skim writing style really comes in useful.

                                                                                                                                                                                                            - Travel and itinerary: Again, ChatGPT can anticipate details much more, and give more unique suggestions. I am much more likely to find hidden gems or get good time-savers than Claude, which often feels like it is just rereading Yelp for you.

                                                                                                                                                                                                            - Historical research: ChatGPT wins on this by a mile. You can tell ChatGPT has been trained on actual historical texts and physical books. You can track long historical trends, pull examples and quotes, and even give you specific book or page(!) references of where to check the sources. Meanwhile, all Claude will give you is a web search on the topic.

                                                                                                                                                                                                              • aggie

                                                                                                                                                                                                                today at 7:59 PM

                                                                                                                                                                                                                How does #3 square with Anthropic's literal warehouse full of books we've seen from the copyright case? Did OpenAI scan more books? Or did they take a shadier route of training on digital books despite copyright issues, but end up with a deeper library?

                                                                                                                                                                                                                  • legitster

                                                                                                                                                                                                                    today at 10:27 PM

                                                                                                                                                                                                                    I have no idea, but I suspect there's a difference between using books to train an LLM and be able to reproduce text/writing styles, and being able to actually recall knowledge in said books.

                                                                                                                                                                                                                    • rolisz

                                                                                                                                                                                                                      today at 8:10 PM

                                                                                                                                                                                                                      I think they bought the books after they were caught that they pirated the books and lost that case (because they pirated, not because of copyright).

                                                                                                                                                                                                              • eaf7e281

                                                                                                                                                                                                                today at 9:25 PM

                                                                                                                                                                                                                It's hard to say. Maybe it has to do with the way Claude responds or the lack of "thinking" compared to other models. I personally love Claude and it's my only subscription right now, but it just feels weird compared to the others as a personal assistant.

                                                                                                                                                                                                            • quietsegfault

                                                                                                                                                                                                              today at 10:40 PM

                                                                                                                                                                                                              Claude is far superior for daily chat. I have to work hard to get it to not learn how to work around various bad behaviors I have but don’t want to change.

                                                                                                                                                                                                              • solarkraft

                                                                                                                                                                                                                today at 6:43 PM

                                                                                                                                                                                                                But that’s what makes it so powerful (yeah, mixing model and frontend discussion here yet again). I have yet to see a non-DIY product that can so effortlessly call tens of tools by different providers to satisfy your request.

                                                                                                                                                                                                            • Squarex

                                                                                                                                                                                                              today at 8:59 PM

                                                                                                                                                                                                              Claude sucks at non English languages. Gemini and ChatGPT are much better. Grok is the worst. I am a native Czech speaker and Claude makes up words and Grok sometimes respond in Russian. So while I love it for coding, it’s unusable for general purpose for me.

                                                                                                                                                                                                                • 9dev

                                                                                                                                                                                                                  today at 9:05 PM

                                                                                                                                                                                                                  > Grok sometimes respond in Russian

                                                                                                                                                                                                                  Geopolitically speaking this is hilarious.

                                                                                                                                                                                                                    • Squarex

                                                                                                                                                                                                                      today at 9:58 PM

                                                                                                                                                                                                                      The voice mode sounded like a Ukrainian trying to speak Czech. I don’t think it means anything.

                                                                                                                                                                                                                  • jorl17

                                                                                                                                                                                                                    today at 10:07 PM

                                                                                                                                                                                                                    Claude is quite good at European Portuguese in my limited tests. Gemini 3 is also very good. ChatGPT is just OK and keeps code-switching all the time, it's very bizarre.

                                                                                                                                                                                                                    I used to think of Gemini as the lead in terms of Portuguese, but recently subjectively started enjoying Claude more (even before Opus 4.5).

                                                                                                                                                                                                                    In spite of this, ChatGPT is what I use for everyday conversational chat because it has loads of memories there, because of the top of the line voice AI, and, mostly, because I just brainstorm or do 1-off searches with it. I think effectively ChatGPT is my new Google and first scratchpad for ideas.

                                                                                                                                                                                                                    • kuboble

                                                                                                                                                                                                                      today at 9:58 PM

                                                                                                                                                                                                                      Claude code (opus) is very good in Polish.

                                                                                                                                                                                                                      I sometimes vibe code in polish and it's as good as with English for me. It speaks a natural, native level Polish.

                                                                                                                                                                                                                      I used opus to translate thousands of strings in my app into polish, Korean, and two Chinese dialects. Polish one is great, and the other are also good according to my customers.

                                                                                                                                                                                                              • blibble

                                                                                                                                                                                                                today at 5:52 PM

                                                                                                                                                                                                                > We build Claude with Claude. Our engineers write code with Claude Code every day

                                                                                                                                                                                                                well that explains quite a bit

                                                                                                                                                                                                                  • jsheard

                                                                                                                                                                                                                    today at 5:58 PM

                                                                                                                                                                                                                    CC has >6000 open issues, despite their bot auto-culling them after 60 days of inactivity. It was ~5800 when I looked just a few days ago so they seem to be accelerating towards some kind of bug singularity.

                                                                                                                                                                                                                      • dkersten

                                                                                                                                                                                                                        today at 8:33 PM

                                                                                                                                                                                                                        Just anecdotally, each release seems to be buggier than the last.

                                                                                                                                                                                                                        To me, their claim that they are vibe coding Claude code isn’t the flex they think it is.

                                                                                                                                                                                                                        I find it harder and harder to trust anthropic for business related use and not just hobby tinkering. Between buggy releases, opaque and often seemingly glitches rate limits and usage limits, and the model quality inconsistency, it’s just not something I’d want to bet a business on.

                                                                                                                                                                                                                          • zahlman

                                                                                                                                                                                                                            today at 9:15 PM

                                                                                                                                                                                                                            I think I would be much more frightened if it were working well.

                                                                                                                                                                                                                              • ifwinterco

                                                                                                                                                                                                                                today at 10:09 PM

                                                                                                                                                                                                                                Exactly, thank goodness it's still a bit rubbish in some aspects

                                                                                                                                                                                                                        • tgtweak

                                                                                                                                                                                                                          today at 6:08 PM

                                                                                                                                                                                                                          plot twist, it's all claude code instances submitting bug reports on behalf of end users.

                                                                                                                                                                                                                            • accrual

                                                                                                                                                                                                                              today at 6:30 PM

                                                                                                                                                                                                                              It's Claude, all the way down.

                                                                                                                                                                                                                          • elAhmo

                                                                                                                                                                                                                            today at 8:01 PM

                                                                                                                                                                                                                            Insane to think that a relatively simple CLI tool has so many open issues...

                                                                                                                                                                                                                              • emilsedgh

                                                                                                                                                                                                                                today at 8:13 PM

                                                                                                                                                                                                                                It's not really a simple CLI tool though it's really interactive.

                                                                                                                                                                                                                                • trymas

                                                                                                                                                                                                                                  today at 8:17 PM

                                                                                                                                                                                                                                  What’s so simple about it?

                                                                                                                                                                                                                                    • elAhmo

                                                                                                                                                                                                                                      today at 8:38 PM

                                                                                                                                                                                                                                      I said relatively simple. It is mostly an API interface with Anthropic models, with tool calling on top of it, very simple input and output.

                                                                                                                                                                                                                                        • brookst

                                                                                                                                                                                                                                          today at 9:15 PM

                                                                                                                                                                                                                                          With extensibility via plugins, MCP (stdio and http), UI to prompt the user for choices and redirection, tools to manage and view context, and on and on.

                                                                                                                                                                                                                                          It is not at all a small app, at least as far as UX surface area. There are, what, 40ish slash commands? Each one is an opportunity for bugs and feature gaps.

                                                                                                                                                                                                                                          • 9dev

                                                                                                                                                                                                                                            today at 8:58 PM

                                                                                                                                                                                                                                            I’m pretty certain you haven’t used it yet(to its fullest extent) then. Claude Code is easily one of the most complex terminal UIs I have seen yet.

                                                                                                                                                                                                                                              • dvfjsdhgfv

                                                                                                                                                                                                                                                today at 9:15 PM

                                                                                                                                                                                                                                                Could you explain why? When I think about complex TUIs, I think about things we were building with Turbo Vision in the 90s.

                                                                                                                                                                                                                                                  • gorbypark

                                                                                                                                                                                                                                                    today at 9:56 PM

                                                                                                                                                                                                                                                    I’m going to buck the trend and say it’s really not that complex. AFAIK they are using Ink, which is React with a TUI renderer.

                                                                                                                                                                                                                                                    Cue I could build it in a weekend vibes, I built my own agent TUI using the OpenAI agent SDK and Ink. Of course it’s not as fleshed out as Claude, but it supports git work trees for multi agent, slash commands, human in the loop prompts and etc. If I point it at the Anthropic models it more or less produces results as m good as the real Claude TUI.

                                                                                                                                                                                                                                                    I actually “decompiled” the Claude tools and prompts and recreated them. As of 6 months ago Claude was 15 tools, mostly pretty basic (list for, read file, wrote file, bash, etc) with some very clever prompts, especially the task tool it uses to do the quasi planning mode task bullets (even when not in planning mode).

                                                                                                                                                                                                                                                    Honestly the idea of bringing this all together with an affordable monthly service and obviously some seriously creative “prompt engineers” is the magic/hard part (and making the model itself, obviously).

                                                                                                                                                                                                                                    • dwaltrip

                                                                                                                                                                                                                                      today at 9:00 PM

                                                                                                                                                                                                                                      sips coffee… ahh yes, let me find that classic Dropbox rsync comment

                                                                                                                                                                                                                                  • paxys

                                                                                                                                                                                                                                    today at 6:15 PM

                                                                                                                                                                                                                                    Half of them were probably opened yesterday during the Claude outage.

                                                                                                                                                                                                                                      • anematode

                                                                                                                                                                                                                                        today at 6:51 PM

                                                                                                                                                                                                                                        Nah, it was at like 5500 before.

                                                                                                                                                                                                                                • raincole

                                                                                                                                                                                                                                  today at 6:08 PM

                                                                                                                                                                                                                                  It explains how important dogfooding is if you want to make an extremely successful product.

                                                                                                                                                                                                                                  • jama211

                                                                                                                                                                                                                                    today at 6:05 PM

                                                                                                                                                                                                                                    It’s extremely successful, not sure what it explains other than your biases

                                                                                                                                                                                                                                      • blibble

                                                                                                                                                                                                                                        today at 6:08 PM

                                                                                                                                                                                                                                        Microsoft's products are also extremely successful

                                                                                                                                                                                                                                        they're also total garbage

                                                                                                                                                                                                                                          • simianwords

                                                                                                                                                                                                                                            today at 6:23 PM

                                                                                                                                                                                                                                            but they have the advantage of already being a big company. Anthropic is new and there's no reason for people to use it

                                                                                                                                                                                                                                              • kuboble

                                                                                                                                                                                                                                                today at 10:11 PM

                                                                                                                                                                                                                                                The tool is absolutely fantastic coding assistant. That's why I use it.

                                                                                                                                                                                                                                                The amount of non-critical bugs all over the place is at least a magnitude larger than of any software I was using daily ever.

                                                                                                                                                                                                                                                Plenty of built in /commands don't work. Sometimes it accepts keystrokes with 1 second delays. It often scrolls hundreds of lines in console after each key stroke Every now and then it crashes completely and is unrecoverable (I once have up and installed a fresh wls) When you ask it question in plan mode it is somewhat of an art to find the answer because after answering the question it will dump the whole current plan (free screens of text)

                                                                                                                                                                                                                                                And just in general the technical feeling of the TUI is that of a vibe coded project that got too big to control.

                                                                                                                                                                                                                                                • Izikiel43

                                                                                                                                                                                                                                                  today at 9:57 PM

                                                                                                                                                                                                                                                  what about if management gives them a reason? You can think of which those can be.

                                                                                                                                                                                                                                              • holoduke

                                                                                                                                                                                                                                                today at 8:27 PM

                                                                                                                                                                                                                                                Claude is by far the most popular and best assistant currently available for a developer.

                                                                                                                                                                                                                                                  • wavemode

                                                                                                                                                                                                                                                    today at 8:43 PM

                                                                                                                                                                                                                                                    Okay, and Windows is by far the most popular desktop operating system.

                                                                                                                                                                                                                                                    Discussions are pointless when the parties are talking past each other.

                                                                                                                                                                                                                                                      • pluralmonad

                                                                                                                                                                                                                                                        today at 8:56 PM

                                                                                                                                                                                                                                                        Popular meaning lots of people like it or that it is relatively widespread? Polio used to be popular in the latter way.

                                                                                                                                                                                                                                                          • quietsegfault

                                                                                                                                                                                                                                                            today at 10:42 PM

                                                                                                                                                                                                                                                            I like windows, it’s fine. I like MacOS better. I like Linux. None of them are garbage or unusable.

                                                                                                                                                                                                                                                              • blibble

                                                                                                                                                                                                                                                                today at 10:48 PM

                                                                                                                                                                                                                                                                have you used Windows 11?

                                                                                                                                                                                                                                                                file explorer takes 5 seconds to open

                                                                                                                                                                                                                                            • acedTrex

                                                                                                                                                                                                                                              today at 7:18 PM

                                                                                                                                                                                                                                              Something being successful and something being a high quality product with good engineering are two completely different questions.

                                                                                                                                                                                                                                              • mvdtnz

                                                                                                                                                                                                                                                today at 6:38 PM

                                                                                                                                                                                                                                                Anthropic has perhaps the most embarrassing status page history I have ever seen. They are famous for downtime.

                                                                                                                                                                                                                                                https://status.claude.com/

                                                                                                                                                                                                                                                  • ronsor

                                                                                                                                                                                                                                                    today at 6:46 PM

                                                                                                                                                                                                                                                    As opposed to other companies which are smart enough not to report outages.

                                                                                                                                                                                                                                                      • tavavex

                                                                                                                                                                                                                                                        today at 7:29 PM

                                                                                                                                                                                                                                                        So, there are only two types of companies: ones that have constant downtime, and ones that have constant downtime but hide it, right?

                                                                                                                                                                                                                                                          • Sebguer

                                                                                                                                                                                                                                                            today at 7:48 PM

                                                                                                                                                                                                                                                            Basically, yes.

                                                                                                                                                                                                                                                    • djeastm

                                                                                                                                                                                                                                                      today at 9:05 PM

                                                                                                                                                                                                                                                      The best way to use Claude's models seems to be some other inference provider (either OpenRouter or directly)

                                                                                                                                                                                                                                                      • Computer0

                                                                                                                                                                                                                                                        today at 8:02 PM

                                                                                                                                                                                                                                                        The competition doesn't currently have all 99's - https://status.openai.com/

                                                                                                                                                                                                                                                        • dimgl

                                                                                                                                                                                                                                                          today at 6:42 PM

                                                                                                                                                                                                                                                          And yet people still use them.

                                                                                                                                                                                                                                                  • quietsegfault

                                                                                                                                                                                                                                                    today at 10:40 PM

                                                                                                                                                                                                                                                    What does it explain, oh snark master supreme?

                                                                                                                                                                                                                                                    • cedws

                                                                                                                                                                                                                                                      today at 7:04 PM

                                                                                                                                                                                                                                                      The sandboxing in CC is an absolute joke, it's no wonder there's an explosion of sandbox wrappers at the moment. There's going to be a security catastrophe at some point, no doubt about it.

                                                                                                                                                                                                                                                      • gjsman-1000

                                                                                                                                                                                                                                                        today at 5:53 PM

                                                                                                                                                                                                                                                        Also explains why Claude Code is a React app outputting to a Terminal. (Seriously.)

                                                                                                                                                                                                                                                          • krystofbe

                                                                                                                                                                                                                                                            today at 8:31 PM

                                                                                                                                                                                                                                                            I did some debugging on this today. The results are... sobering.

                                                                                                                                                                                                                                                            Memory comparison of AI coding CLIs (single session, idle):

                                                                                                                                                                                                                                                              | Tool        | Footprint | Peak   | Language      |
                                                                                                                                                                                                                                                              |-------------|-----------|--------|---------------|
                                                                                                                                                                                                                                                              | Codex       | 15 MB     | 15 MB  | Rust          |
                                                                                                                                                                                                                                                              | OpenCode    | 130 MB    | 130 MB | Go            |
                                                                                                                                                                                                                                                              | Claude Code | 360 MB    | 746 MB | Node.js/React |
                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                            That's a 24x to 50x difference for tools that do the same thing: send text to an API.

                                                                                                                                                                                                                                                            vmmap shows Claude Code reserves 32.8 GB virtual memory just for the V8 heap, has 45% malloc fragmentation, and a peak footprint of 746 MB that never gets released, classic leak pattern.

                                                                                                                                                                                                                                                            On my 16 GB Mac, a "normal" workload (2 Claude sessions + browser + terminal) pushes me into 9.5 GB swap within hours. My laptop genuinely runs slower with Claude Code than when I'm running local LLMs.

                                                                                                                                                                                                                                                            I get that shipping fast matters, but building a CLI with React and a full Node.js runtime is an architectural choice with consequences. Codex proves this can be done in 15 MB. Every Claude Code session costs me 360+ MB, and with MCP servers spawning per session, it multiplies fast.

                                                                                                                                                                                                                                                              • atonse

                                                                                                                                                                                                                                                                today at 10:01 PM

                                                                                                                                                                                                                                                                Jarred Sumner (bun creator, bun was recently acquired by Anthropic) has been working exclusively on bringing down memory leaks and improving performance in CC the last couple weeks. He's been tweeting his progress.

                                                                                                                                                                                                                                                                This is just regular tech debt that happens from building something to $1bn in revenue as fast as you possibly can, optimize later.

                                                                                                                                                                                                                                                                They're optimizing now. I'm sure they'll have it under control in no time.

                                                                                                                                                                                                                                                                CC is an incredible product (so is codex but I use CC more). Yes, lately it's gotten bloated, but the value it provides makes it bearable until they fix it in short time.

                                                                                                                                                                                                                                                                  • bdangubic

                                                                                                                                                                                                                                                                    today at 10:10 PM

                                                                                                                                                                                                                                                                    if I had a dollar for each time I heard “until they fix it in short time” I’d have Elon money

                                                                                                                                                                                                                                                                • Weryj

                                                                                                                                                                                                                                                                  today at 8:35 PM

                                                                                                                                                                                                                                                                  I believe they use https://bun.com/ Not Node.js

                                                                                                                                                                                                                                                                  • slopusila

                                                                                                                                                                                                                                                                    today at 10:35 PM

                                                                                                                                                                                                                                                                    why do you care about uncommitted virtual memory? that's practically infinite

                                                                                                                                                                                                                                                                • jama211

                                                                                                                                                                                                                                                                  today at 6:04 PM

                                                                                                                                                                                                                                                                  There’s nothing wrong with that, except it lets ai skeptics feel superior

                                                                                                                                                                                                                                                                    • RohMin

                                                                                                                                                                                                                                                                      today at 6:54 PM

                                                                                                                                                                                                                                                                      https://www.youtube.com/watch?v=LvW1HTSLPEk

                                                                                                                                                                                                                                                                      I thought this was a solid take

                                                                                                                                                                                                                                                                        • jdthedisciple

                                                                                                                                                                                                                                                                          today at 7:27 PM

                                                                                                                                                                                                                                                                          interesting

                                                                                                                                                                                                                                                                      • overgard

                                                                                                                                                                                                                                                                        today at 9:44 PM

                                                                                                                                                                                                                                                                        I haven't looked at it directly, so I can speak on quality, but it's a pretty weird way to write a terminal app

                                                                                                                                                                                                                                                                        • 3836293648

                                                                                                                                                                                                                                                                          today at 7:02 PM

                                                                                                                                                                                                                                                                          Oh come on. It's massively wrong. It is always wrong. It's not always wrong enough to be important, but it doesn't stop being wrong

                                                                                                                                                                                                                                                                            • vntok

                                                                                                                                                                                                                                                                              today at 8:19 PM

                                                                                                                                                                                                                                                                              You should elaborate. What are your criteria and why do you think they should matter to actual users?

                                                                                                                                                                                                                                                                          • exe34

                                                                                                                                                                                                                                                                            today at 6:33 PM

                                                                                                                                                                                                                                                                            I use AI and I can call AI slop shit if it smells like shit.

                                                                                                                                                                                                                                                                        • krona

                                                                                                                                                                                                                                                                          today at 6:17 PM

                                                                                                                                                                                                                                                                          Sounds like a web developer defined the solution a year before they knew what the problem was.

                                                                                                                                                                                                                                                                          • today at 6:12 PM

                                                                                                                                                                                                                                                                            • sweetheart

                                                                                                                                                                                                                                                                              today at 6:15 PM

                                                                                                                                                                                                                                                                              React's core is agnostic when it comes to the actual rendering interface. It's just all the fancy algos for diffing and updating the underlying tree. Using it for rendering a TUI is a very reasonable application of the technology.

                                                                                                                                                                                                                                                                                • skydhash

                                                                                                                                                                                                                                                                                  today at 7:55 PM

                                                                                                                                                                                                                                                                                  The terminal UI is not a tree structure that you can diff. It’s a 2D cells of characters, where every manipulation is a stream of texts. Refreshing or diffing that makes no sense.

                                                                                                                                                                                                                                                                                    • HarHarVeryFunny

                                                                                                                                                                                                                                                                                      today at 10:31 PM

                                                                                                                                                                                                                                                                                      IMO diffing might have made sense to do here, but that's not what they chose to do.

                                                                                                                                                                                                                                                                                      What's apparently happening is that React tells Ink to update (re-render) the UI "scene graph", and Ink then generates a new full-screen image of how the terminal should look, then passes this screen image to another library, log-update, to draw to the terminal. log-update draws these screen images by a flicker-inducing clear-then-redraw, which it has now fixed by using escape codes to have the terminal buffer and combine these clear-then-redraw commands, thereby hiding the clear.

                                                                                                                                                                                                                                                                                      An alternative solution, rather than using the flicker-inducing clear-then-redraw in the first place, would have been just to do terminal screen image diffs and draw the changes (which is something I did back in the day for fun, sending full-screen ASCII digital clock diffs over a slow 9600baud serial link to a real terminal).

                                                                                                                                                                                                                                                                                      • Longwelwind

                                                                                                                                                                                                                                                                                        today at 8:19 PM

                                                                                                                                                                                                                                                                                        When doing advanced terminal UI, you might at some point have to layout content inside the terminal. At some point, you might need to update the content of those boxes because the state of the underlying app has changed. At that point, refreshing and diffing can make sense. For some, the way React organizes logic to render and update an UI is nice and can be used in other contexts.

                                                                                                                                                                                                                                                                                          • skydhash

                                                                                                                                                                                                                                                                                            today at 8:41 PM

                                                                                                                                                                                                                                                                                            How big is the UI state that it makes sense to bring in React and the related accidental complexity? I’m ready to bet that no TUI have that big of a state.

                                                                                                                                                                                                                                                                                        • bizzleDawg

                                                                                                                                                                                                                                                                                          today at 8:13 PM

                                                                                                                                                                                                                                                                                          Only in the same way that the pixels displayed in a browser are not a tree structure that you can diff - the diffing happens at a higher level of abstraction than what's rendered.

                                                                                                                                                                                                                                                                                          Diffing and only updating the parts of the TUI which have changed does make sense if you consider the alternative is to rewrite the entire screen every "frame". There are other ways to abstract this, e.g. a library like tqmd for python may well have a significantly more simple abstraction than a tree for storing what it's going to update next for the progress bar widget than claude, but it also provides a much more simple interface.

                                                                                                                                                                                                                                                                                          To me it seems more fair game to attack it for being written in JS than for using a particular "rendering" technique to minimise updates sent to the terminal.

                                                                                                                                                                                                                                                                                            • skydhash

                                                                                                                                                                                                                                                                                              today at 8:36 PM

                                                                                                                                                                                                                                                                                              Most UI library store states in tree of components. And if you’re creating a custom widget, they will give you a 2D context for the drawing operations. Using react makes sense in those cases because what you’re diffing is state, then the UI library will render as usual, which will usually be done via compositing.

                                                                                                                                                                                                                                                                                              The terminal does not have a render phase (or an update state phase). You either refresh the whole screen (flickering) or control where to update manually (custom engine, may flicker locally). But any updates are sequential (moving the cursor and then sending what to be displayed), not at once like 2D pixel rendering does.

                                                                                                                                                                                                                                                                                              So most TUI only updates when there’s an event to do so or at a frequency much lower than 60fps. This is why top and htop have a setting for that. And why other TUI software propose a keybind to refresh and reset their rendering engines.

                                                                                                                                                                                                                                                                                          • sweetheart

                                                                                                                                                                                                                                                                                            today at 10:24 PM

                                                                                                                                                                                                                                                                                            The "UI" is indeed represented in memory in tree-like structure for which positioning is calculated according to a flexbox-like layout algo. React then handles the diffing of this structure, and the terminal UI is updated according to only what has changed by manually overwriting sections of the buffer. The CLI library is called Ink and I forget the name of the flexbox layout algo implementation, but you can read about the internals if you look at the Ink repo.

                                                                                                                                                                                                                                                                                    • thehamkercat

                                                                                                                                                                                                                                                                                      today at 5:59 PM

                                                                                                                                                                                                                                                                                      Same with opencode and gemini, it's disgusting

                                                                                                                                                                                                                                                                                      Codex (by openai ironically) seems to be the fastest/most-responsive, opens instantly and is written in rust but doesn't contain that many features

                                                                                                                                                                                                                                                                                      Claude opens in around 3-4 seconds

                                                                                                                                                                                                                                                                                      Opencode opens in 2 seconds

                                                                                                                                                                                                                                                                                      Gemini-cli is an abomination which opens in around 16 second for me right now, and in 8 seconds on a fresh install

                                                                                                                                                                                                                                                                                      Codex takes 50ms for reference...

                                                                                                                                                                                                                                                                                      --

                                                                                                                                                                                                                                                                                      If their models are so good, why are they not rewriting their own react in cli bs to c++ or rust for 100x performance improvement (not kidding, it really is that much)

                                                                                                                                                                                                                                                                                        • g947o

                                                                                                                                                                                                                                                                                          today at 6:23 PM

                                                                                                                                                                                                                                                                                          Great question, and my guess:

                                                                                                                                                                                                                                                                                          If you build React in C++ and Rust, even if the framework is there, you'll likely need to write your components in C++/Rust. That is a difficult problem. There are actually libraries out there that allow you to build web UI with Rust, although they are for web (+ HTML/CSS) and not specifically CLI stuff.

                                                                                                                                                                                                                                                                                          So someone needs to create such a library that is properly maintained and such. And you'll likely develop slower in Rust compared to JS.

                                                                                                                                                                                                                                                                                          These companies don't see a point in doing that. So they just use whatever already exists.

                                                                                                                                                                                                                                                                                            • shoeb00m

                                                                                                                                                                                                                                                                                              today at 6:37 PM

                                                                                                                                                                                                                                                                                              Opencode wrote their own tui library in zig, and then build a solidjs library on top of that.

                                                                                                                                                                                                                                                                                              https://github.com/anomalyco/opentui

                                                                                                                                                                                                                                                                                                • g947o

                                                                                                                                                                                                                                                                                                  today at 9:38 PM

                                                                                                                                                                                                                                                                                                  This has nothing to do with React style UI building.

                                                                                                                                                                                                                                                                                              • Philpax

                                                                                                                                                                                                                                                                                                today at 6:37 PM

                                                                                                                                                                                                                                                                                                Those Rust libraries have existed for some time:

                                                                                                                                                                                                                                                                                                - https://github.com/ratatui/ratatui

                                                                                                                                                                                                                                                                                                - https://github.com/ccbrown/iocraft

                                                                                                                                                                                                                                                                                                - https://crates.io/crates/dioxus-tui

                                                                                                                                                                                                                                                                                                  • g947o

                                                                                                                                                                                                                                                                                                    today at 9:37 PM

                                                                                                                                                                                                                                                                                                    Where is React? These are TUI libraries, which are not the same thing

                                                                                                                                                                                                                                                                                                      • Philpax

                                                                                                                                                                                                                                                                                                        today at 9:41 PM

                                                                                                                                                                                                                                                                                                        iocraft and dioxus-tui implement the React model, or derivatives of it.

                                                                                                                                                                                                                                                                                                • pdntspa

                                                                                                                                                                                                                                                                                                  today at 9:40 PM

                                                                                                                                                                                                                                                                                                  and why do they need react...

                                                                                                                                                                                                                                                                                                    • Philpax

                                                                                                                                                                                                                                                                                                      today at 9:45 PM

                                                                                                                                                                                                                                                                                                      That's actually relatively understandable. The React model (not necessarily React itself) of compositional reactive one-way data binding has become dominant in UI development over the last decade because it's easy to work with and does not require you to keep track of the state of a retained UI.

                                                                                                                                                                                                                                                                                                      Most modern UI systems are inspired by React or a variant of its model.

                                                                                                                                                                                                                                                                                              • azinman2

                                                                                                                                                                                                                                                                                                today at 6:14 PM

                                                                                                                                                                                                                                                                                                Why does it matter if Claude Code opens in 3-4 seconds if everything you do with it can take many seconds to minutes? Seems irrelevant to me.

                                                                                                                                                                                                                                                                                                  • RohMin

                                                                                                                                                                                                                                                                                                    today at 6:47 PM

                                                                                                                                                                                                                                                                                                    I guess with ~50 years of CPU advancements, 3-4 seconds for a TUI to open makes it seem like we lost the plot somewhere along the way.

                                                                                                                                                                                                                                                                                                      • strange_quark

                                                                                                                                                                                                                                                                                                        today at 7:06 PM

                                                                                                                                                                                                                                                                                                        Don’t forget they’ve also publicly stated (bragged?) about the monumental accomplishment of getting some text in a terminal to render at 60fps.

                                                                                                                                                                                                                                                                                                    • mbesto

                                                                                                                                                                                                                                                                                                      today at 7:53 PM

                                                                                                                                                                                                                                                                                                      This is exactly the type of thing that AI code writers don't do well - understand the prioritization of feature development.

                                                                                                                                                                                                                                                                                                      Some developers say 3-4 seconds are important to them, others don't. Who decides what the truth is? A human? ClawdBot?

                                                                                                                                                                                                                                                                                                      • wahnfrieden

                                                                                                                                                                                                                                                                                                        today at 6:16 PM

                                                                                                                                                                                                                                                                                                        Because when the agent is taking many seconds to minutes, I am starting new agents instead of waiting or switching to non-agent tasks

                                                                                                                                                                                                                                                                                                    • bdangubic

                                                                                                                                                                                                                                                                                                      today at 10:16 PM

                                                                                                                                                                                                                                                                                                      50ms to open and then 2hrs to solve a simple problem vs 4s to open and then 5m to solve a problem, eh?

                                                                                                                                                                                                                                                                                                      • shoeb00m

                                                                                                                                                                                                                                                                                                        today at 6:35 PM

                                                                                                                                                                                                                                                                                                        codex cli is missing a bunch of ux features like resizing on terminal size change.

                                                                                                                                                                                                                                                                                                        Opencode's core is actually written in zig, only ui orchestration is in solidjs. It's only slightly slower to load than neo-vim on my system.

                                                                                                                                                                                                                                                                                                        https://github.com/anomalyco/opentui

                                                                                                                                                                                                                                                                                                        • wahnfrieden

                                                                                                                                                                                                                                                                                                          today at 6:15 PM

                                                                                                                                                                                                                                                                                                          Codex team made the right call to rewrite its TypeScript to Rust early on

                                                                                                                                                                                                                                                                                                      • tayo42

                                                                                                                                                                                                                                                                                                        today at 6:02 PM

                                                                                                                                                                                                                                                                                                        Is this a react feature or did they build something to translate react to text for display in the terminal?

                                                                                                                                                                                                                                                                                                          • sbarre

                                                                                                                                                                                                                                                                                                            today at 6:40 PM

                                                                                                                                                                                                                                                                                                            React, the framework, is separate from react-dom, the browser rendering library. Most people think of those two as one thing because they're the most popular combo.

                                                                                                                                                                                                                                                                                                            But there are many different rendering libraries you can use with React, including Ink, which is designed for building CLI TUIs..

                                                                                                                                                                                                                                                                                                              • skydhash

                                                                                                                                                                                                                                                                                                                today at 8:05 PM

                                                                                                                                                                                                                                                                                                                Anyone that knows a bit about terminals would already know that using React is not a good solution for TUI. Terminal rendering is done as a stream of characters which includes both the text and how it displays, which can also alter previously rendered texts. Diffing that is nonsense.

                                                                                                                                                                                                                                                                                                                  • 9dev

                                                                                                                                                                                                                                                                                                                    today at 9:03 PM

                                                                                                                                                                                                                                                                                                                    You’re not diffing that, though. The app keeps a virtual representation of the UI state in a tree structure that it diffs on, then serializes that into a formatted string to draw to the out put stream. It’s not about limiting the amount of characters redrawn (that would indeed be nonsense), but handling separate output regions effectively.

                                                                                                                                                                                                                                                                                                            • pkkim

                                                                                                                                                                                                                                                                                                              today at 6:12 PM

                                                                                                                                                                                                                                                                                                              They used Ink: https://github.com/vadimdemedes/ink

                                                                                                                                                                                                                                                                                                              I've used it myself. It has some rough edges in terms of rendering performance but it's nice overall.

                                                                                                                                                                                                                                                                                                                • tayo42

                                                                                                                                                                                                                                                                                                                  today at 6:21 PM

                                                                                                                                                                                                                                                                                                                  Thats pretty interesting looking, thanks!

                                                                                                                                                                                                                                                                                                              • embedding-shape

                                                                                                                                                                                                                                                                                                                today at 6:12 PM

                                                                                                                                                                                                                                                                                                                Not a built-in React feature. The idea been around for quite some time, I came across it initially with https://github.com/vadimdemedes/ink back in 2022 sometime.

                                                                                                                                                                                                                                                                                                                • tayo42

                                                                                                                                                                                                                                                                                                                  today at 6:48 PM

                                                                                                                                                                                                                                                                                                                  i had claude make a snake clone and fix all the flickering in like 20 minutes with the library mentioned lol

                                                                                                                                                                                                                                                                                                              • CooCooCaCha

                                                                                                                                                                                                                                                                                                                today at 6:02 PM

                                                                                                                                                                                                                                                                                                                It’s really not that crazy.

                                                                                                                                                                                                                                                                                                                React itself is a frontend-agnostic library. People primarily use it for writing websites but web support is actually a layer on top of base react and can be swapped out for whatever.

                                                                                                                                                                                                                                                                                                                So they’re really just using react as a way to organize their terminal UI into components. For the same reason it’s handy to organize web ui into components.

                                                                                                                                                                                                                                                                                                                  • dreamteam1

                                                                                                                                                                                                                                                                                                                    today at 8:47 PM

                                                                                                                                                                                                                                                                                                                    And some companies use it to write start menus.

                                                                                                                                                                                                                                                                                                                • CamperBob2

                                                                                                                                                                                                                                                                                                                  today at 6:53 PM

                                                                                                                                                                                                                                                                                                                  Also explains why Claude Code is a React app outputting to a Terminal. (Seriously.)

                                                                                                                                                                                                                                                                                                                  Who cares, and why?

                                                                                                                                                                                                                                                                                                                  All of the major providers' CLI harnesses use Ink: https://github.com/vadimdemedes/ink

                                                                                                                                                                                                                                                                                                              • spruce_tips

                                                                                                                                                                                                                                                                                                                today at 6:09 PM

                                                                                                                                                                                                                                                                                                                Ah yes, explains why it takes 3 seconds for a new chat to load after I click new chat in the macOS app.

                                                                                                                                                                                                                                                                                                                • exe34

                                                                                                                                                                                                                                                                                                                  today at 6:32 PM

                                                                                                                                                                                                                                                                                                                  Can Claude fix the flicker in Claude yet?

                                                                                                                                                                                                                                                                                                                    • nickstinemates

                                                                                                                                                                                                                                                                                                                      today at 7:41 PM

                                                                                                                                                                                                                                                                                                                      [flagged]

                                                                                                                                                                                                                                                                                                                        • losvedir

                                                                                                                                                                                                                                                                                                                          today at 8:00 PM

                                                                                                                                                                                                                                                                                                                          Oh, is that what the issue is? I've seen the "flicker" thing as a meme, but as someone who uses Claude Code I've never noticed. I use ghostty mostly, so maybe it's not an issue with ghostty? Or maybe I just haven't noticed it.

                                                                                                                                                                                                                                                                                                                            • nickstinemates

                                                                                                                                                                                                                                                                                                                              today at 8:08 PM

                                                                                                                                                                                                                                                                                                                              Yes it's people using bad tools on underpowered machines as far as I have seen

                                                                                                                                                                                                                                                                                                                                • winrid

                                                                                                                                                                                                                                                                                                                                  today at 9:07 PM

                                                                                                                                                                                                                                                                                                                                  Happens with Konsole sometimes on an 8th gen i7. This cpu can run many instances of intellij just fine, but somehow this TUI manages to be slow sometimes. Codex is fine, so no good argument exists really.

                                                                                                                                                                                                                                                                                                                          • hkt

                                                                                                                                                                                                                                                                                                                            today at 8:45 PM

                                                                                                                                                                                                                                                                                                                            Blaming the terminal seems a little backwards. Perhaps the application could take responsibility for being compatible with common terminals?

                                                                                                                                                                                                                                                                                                                • Someone1234

                                                                                                                                                                                                                                                                                                                  today at 5:52 PM

                                                                                                                                                                                                                                                                                                                  Does anyone with more insight into the AI/LLM industry happen to know if the cost to run them in normal user-workflows is falling? The reason I'm asking is because "agent teams" while a cool concept, it largely constrained by the economics of running multiple LLM agents (i.e. plans/API calls that make this practical at scale are expensive).

                                                                                                                                                                                                                                                                                                                  A year or more ago, I read that both Anthropic and OpenAI were losing money on every single request even for their paid subscribers, and I don't know if that has changed with more efficient hardware/software improvements/caching.

                                                                                                                                                                                                                                                                                                                    • simonw

                                                                                                                                                                                                                                                                                                                      today at 6:01 PM

                                                                                                                                                                                                                                                                                                                      The cost per token served has been falling steadily over the past few years across basically all of the providers. OpenAI dropped the price they charged for o3 to 1/5th of what it was in June last year thanks to "engineers optimizing inferencing", and plenty of other providers have found cost savings too.

                                                                                                                                                                                                                                                                                                                      Turns out there was a lot of low-hanging fruit in terms of inference optimization that hadn't been plucked yet.

                                                                                                                                                                                                                                                                                                                      > A year or more ago, I read that both Anthropic and OpenAI were losing money on every single request even for their paid subscribers

                                                                                                                                                                                                                                                                                                                      Where did you hear that? It doesn't match my mental model of how this has played out.

                                                                                                                                                                                                                                                                                                                        • cootsnuck

                                                                                                                                                                                                                                                                                                                          today at 6:11 PM

                                                                                                                                                                                                                                                                                                                          I have not see any reporting or evidence at all that Anthropic or OpenAI is able to make money on inference yet.

                                                                                                                                                                                                                                                                                                                          > Turns out there was a lot of low-hanging fruit in terms of inference optimization that hadn't been plucked yet.

                                                                                                                                                                                                                                                                                                                          That does not mean the frontier labs are pricing their APIs to cover their costs yet.

                                                                                                                                                                                                                                                                                                                          It can both be true that it has gotten cheaper for them to provide inference and that they still are subsidizing inference costs.

                                                                                                                                                                                                                                                                                                                          In fact, I'd argue that's way more likely given that has been precisely the goto strategy for highly-competitive startups for awhile now. Price low to pump adoption and dominate the market, worry about raising prices for financial sustainability later, burn through investor money until then.

                                                                                                                                                                                                                                                                                                                          What no one outside of these frontier labs knows right now is how big the gap is between current pricing and eventual pricing.

                                                                                                                                                                                                                                                                                                                            • chis

                                                                                                                                                                                                                                                                                                                              today at 6:56 PM

                                                                                                                                                                                                                                                                                                                              It's quite clear that these companies do make money on each marginal token. They've said this directly and analysts agree [1]. It's less clear that the margins are high enough to pay off the up-front cost of training each model.

                                                                                                                                                                                                                                                                                                                              [1] https://epochai.substack.com/p/can-ai-companies-become-profi...

                                                                                                                                                                                                                                                                                                                                • m101

                                                                                                                                                                                                                                                                                                                                  today at 8:06 PM

                                                                                                                                                                                                                                                                                                                                  It’s not clear at all because model training upfront costs and how you depreciate them are big unknowns, even for deprecated models. See my last comment for a bit more detail.

                                                                                                                                                                                                                                                                                                                                    • ACCount37

                                                                                                                                                                                                                                                                                                                                      today at 9:52 PM

                                                                                                                                                                                                                                                                                                                                      By now, model lifetime inference compute is >10x model training compute, for mainstream models. Further amortized by things like base model reuse.

                                                                                                                                                                                                                                                                                                                                  • magicalist

                                                                                                                                                                                                                                                                                                                                    today at 7:41 PM

                                                                                                                                                                                                                                                                                                                                    > They've said this directly and analysts agree [1]

                                                                                                                                                                                                                                                                                                                                    chasing down a few sources in that article leads to articles like this at the root of claims[1], which is entirely based on information "according to a person with knowledge of the company’s financials", which doesn't exactly fill me with confidence.

                                                                                                                                                                                                                                                                                                                                    [1] https://www.theinformation.com/articles/openai-getting-effic...

                                                                                                                                                                                                                                                                                                                                    • 9cb14c1ec0

                                                                                                                                                                                                                                                                                                                                      today at 7:19 PM

                                                                                                                                                                                                                                                                                                                                      It's also true that their inference costs are being heavily subsidized. For example, if you calculate Oracles debt into OpenAIs revenue, they would be incredibly far underwater on inference.

                                                                                                                                                                                                                                                                                                                                  • NitpickLawyer

                                                                                                                                                                                                                                                                                                                                    today at 6:26 PM

                                                                                                                                                                                                                                                                                                                                    > they still are subsidizing inference costs.

                                                                                                                                                                                                                                                                                                                                    They are for sure subsidising costs on all you can prompt packages (20-100-200$ /mo). They do that for data gathering mostly, and at a smaller degree for user retention.

                                                                                                                                                                                                                                                                                                                                    > evidence at all that Anthropic or OpenAI is able to make money on inference yet.

                                                                                                                                                                                                                                                                                                                                    You can infer that from what 3rd party inference providers are charging. The largest open models atm are dsv3 (~650B params) and kimi2.5 (1.2T params). They are being served at 2-2.5-3$ /Mtok. That's sonnet / gpt-mini / gemini3-flash price range. You can make some educates guesses that they get some leeway for model size at the 10-15$/ Mtok prices for their top tier models. So if they are inside some sane model sizes, they are likely making money off of token based APIs.

                                                                                                                                                                                                                                                                                                                                      • slopusila

                                                                                                                                                                                                                                                                                                                                        today at 10:44 PM

                                                                                                                                                                                                                                                                                                                                        most of those subscriptions go unused. I barely use 10% of mine

                                                                                                                                                                                                                                                                                                                                        so my unused tokens compensate for the few heavy users

                                                                                                                                                                                                                                                                                                                                    • mrandish

                                                                                                                                                                                                                                                                                                                                      today at 7:02 PM

                                                                                                                                                                                                                                                                                                                                      > I have not see any reporting or evidence at all that Anthropic or OpenAI is able to make money on inference yet.

                                                                                                                                                                                                                                                                                                                                      Anthropic planning an IPO this year is a broad meta-indicator that internally they believe they'll be able to reach break-even sometime next year on delivering a competitive model. Of course, their belief could turn out to be wrong but it doesn't make much sense to do an IPO if you don't think you're close. Assuming you have a choice with other options to raise private capital (which still seems true), it would be better to defer an IPO until you expect quarterly numbers to reach break-even or at least close to it.

                                                                                                                                                                                                                                                                                                                                      Despite the willingness of private investment to fund hugely negative AI spend, the recently growing twitchiness of public markets around AI ecosystem stocks indicates they're already worried prices have exceeded near-term value. It doesn't seem like they're in a mood to fund oceans of dotcom-like red ink for long.

                                                                                                                                                                                                                                                                                                                                        • WarmWash

                                                                                                                                                                                                                                                                                                                                          today at 7:16 PM

                                                                                                                                                                                                                                                                                                                                          IPO'ing is often what you do to give your golden investors an exit hatch to dump their shares on the notoriously idiotic and hype driven public.

                                                                                                                                                                                                                                                                                                                                      • barrkel

                                                                                                                                                                                                                                                                                                                                        today at 6:33 PM

                                                                                                                                                                                                                                                                                                                                        > evidence at all that Anthropic or OpenAI is able to make money on inference yet.

                                                                                                                                                                                                                                                                                                                                        The evidence is in third party inference costs for open source models.

                                                                                                                                                                                                                                                                                                                                    • nubg

                                                                                                                                                                                                                                                                                                                                      today at 6:05 PM

                                                                                                                                                                                                                                                                                                                                      > "engineers optimizing inferencing"

                                                                                                                                                                                                                                                                                                                                      are we sure this is not a fancy way of saying quantization?

                                                                                                                                                                                                                                                                                                                                        • bityard

                                                                                                                                                                                                                                                                                                                                          today at 8:02 PM

                                                                                                                                                                                                                                                                                                                                          When MP3 became popular, people were amazed that you could compress audio to 1/10th its size with minor quality loss. A few decades later, we have audio compression that is much better and higher-quality than MP3, and they took a lot more effort than "MP3 but at a lower bitrate."

                                                                                                                                                                                                                                                                                                                                          The same is happening in AI research now.

                                                                                                                                                                                                                                                                                                                                          • embedding-shape

                                                                                                                                                                                                                                                                                                                                            today at 6:10 PM

                                                                                                                                                                                                                                                                                                                                            Or distilled models, or just slightly smaller models but same architecture. Lots of options, all of them conveniently fitting inside "optimizing inferencing".

                                                                                                                                                                                                                                                                                                                                            • esafak

                                                                                                                                                                                                                                                                                                                                              today at 7:28 PM

                                                                                                                                                                                                                                                                                                                                              Someone made a quality tracker: https://marginlab.ai/trackers/claude-code/

                                                                                                                                                                                                                                                                                                                                              • jmalicki

                                                                                                                                                                                                                                                                                                                                                today at 6:14 PM

                                                                                                                                                                                                                                                                                                                                                A ton of GPU kernels are hugely inefficient. Not saying the numbers are realistic, but look at the 100s of times of gain in the Anthropic performance takehome exam that floated around on here.

                                                                                                                                                                                                                                                                                                                                                And if you've worked with pytorch models a lot, having custom fused kernels can be huge. For instance, look at the kind of gains to be had when FlashAttention came out.

                                                                                                                                                                                                                                                                                                                                                This isn't just quantization, it's actually just better optimization.

                                                                                                                                                                                                                                                                                                                                                Even when it comes to quantization, Blackwell has far better quantization primitives and new floating point types that support row or layer-wise scaling that can quantize with far less quality reduction.

                                                                                                                                                                                                                                                                                                                                                There is also a ton of work in the past year on sub-quadratic attention for new models that gets rid of a huge bottleneck, but like quantization can be a tradeoff, and a lot of progress has been made there on moving the Pareto frontier as well.

                                                                                                                                                                                                                                                                                                                                                It's almost like when you're spending hundreds of billions on capex for GPUs, you can afford to hire engineers to make them perform better without just nerfing the models with more quantization.

                                                                                                                                                                                                                                                                                                                                                  • Der_Einzige

                                                                                                                                                                                                                                                                                                                                                    today at 6:45 PM

                                                                                                                                                                                                                                                                                                                                                    "This isn't X, it's Y" with extra steps.

                                                                                                                                                                                                                                                                                                                                                      • jmalicki

                                                                                                                                                                                                                                                                                                                                                        today at 7:58 PM

                                                                                                                                                                                                                                                                                                                                                        I'm flattered you think I wrote as well as an AI.

                                                                                                                                                                                                                                                                                                                                                          • nubg

                                                                                                                                                                                                                                                                                                                                                            today at 9:20 PM

                                                                                                                                                                                                                                                                                                                                                            lmao

                                                                                                                                                                                                                                                                                                                                            • sumitkumar

                                                                                                                                                                                                                                                                                                                                              today at 6:49 PM

                                                                                                                                                                                                                                                                                                                                              It seems it is true for gemini because they have a humongous sparse model but it isn't so true for the max performance opus-4.5/6 and gpt-5.2/3.

                                                                                                                                                                                                                                                                                                                                          • Aurornis

                                                                                                                                                                                                                                                                                                                                            today at 6:20 PM

                                                                                                                                                                                                                                                                                                                                            > A year or more ago, I read that both Anthropic and OpenAI were losing money on every single request even for their paid subscribers

                                                                                                                                                                                                                                                                                                                                            This gets repeated everywhere but I don't think it's true.

                                                                                                                                                                                                                                                                                                                                            The company is unprofitable overall, but I don't see any reason to believe that their per-token inference costs are below the marginal cost of computing those tokens.

                                                                                                                                                                                                                                                                                                                                            It is true that the company is unprofitable overall when you account for R&D spend, compensation, training, and everything else. This is a deliberate choice that every heavily funded startup should be making, otherwise you're wasting the investment money. That's precisely what the investment money is for.

                                                                                                                                                                                                                                                                                                                                            However I don't think using their API and paying for tokens has negative value for the company. We can compare to models like DeepSeek where providers can charge a fraction of the price of OpenAI tokens and still be profitable. OpenAI's inference costs are going to be higher, but they're charging such a high premium that it's hard to believe they're losing money on each token sold. I think every token paid for moves them incrementally closer to profitability, not away from it.

                                                                                                                                                                                                                                                                                                                                              • 3836293648

                                                                                                                                                                                                                                                                                                                                                today at 7:00 PM

                                                                                                                                                                                                                                                                                                                                                The reports I remember show that they're profitable per-model, but overlap R&D so that the company is negative overall. And therefore will turn a massive profit if they stop making new models.

                                                                                                                                                                                                                                                                                                                                                  • schnable

                                                                                                                                                                                                                                                                                                                                                    today at 8:59 PM

                                                                                                                                                                                                                                                                                                                                                    * stop making new models and people keep using the existing models, not switch to a competitor still investing in new models.

                                                                                                                                                                                                                                                                                                                                                    • trcf23

                                                                                                                                                                                                                                                                                                                                                      today at 7:26 PM

                                                                                                                                                                                                                                                                                                                                                      Doesn’t it also depend on averaging with free users?

                                                                                                                                                                                                                                                                                                                                                  • runarberg

                                                                                                                                                                                                                                                                                                                                                    today at 6:59 PM

                                                                                                                                                                                                                                                                                                                                                    I can see a case for omitting R&D when talking about profitability, but training makes no sense. Training is what makes the model, omitting it is like omitting the cost of running the production facility of a car manufacturer. If AI companies stop training they will stop producing models, and they will run out of a products to sell.

                                                                                                                                                                                                                                                                                                                                                      • vidarh

                                                                                                                                                                                                                                                                                                                                                        today at 8:43 PM

                                                                                                                                                                                                                                                                                                                                                        The reason for this is that the cost scales with the model and training cadence, not usage and so they will hope that they will be able to scale number of inference tokens sold both by increasing use and/or slowing the training cadence as competitors are also forced to aim for overall profitability.

                                                                                                                                                                                                                                                                                                                                                        It is essentially a big game of venture capital chicken at present.

                                                                                                                                                                                                                                                                                                                                                        • Aurornis

                                                                                                                                                                                                                                                                                                                                                          today at 8:01 PM

                                                                                                                                                                                                                                                                                                                                                          It depends on what you're talking about

                                                                                                                                                                                                                                                                                                                                                          If you're looking at overall profitability, you include everything

                                                                                                                                                                                                                                                                                                                                                          If you're talking about unit economics of producing tokens, you only include the marginal cost of each token against the marginal revenue of selling that token

                                                                                                                                                                                                                                                                                                                                                            • runarberg

                                                                                                                                                                                                                                                                                                                                                              today at 9:30 PM

                                                                                                                                                                                                                                                                                                                                                              I don’t understand the logic. Without training the marginal cost of each token goes into nothing. The more you train, the better the model, and (presumably) you will gain more costumer interest. Unlike R&D you will always have to train new models if you want to keep your customers.

                                                                                                                                                                                                                                                                                                                                                              To me this looks likes some creative bookkeeping, or even wishful thinking. It is like if SpaceX omits the price of the satellites when calculating their profits.

                                                                                                                                                                                                                                                                                                                                                  • nodja

                                                                                                                                                                                                                                                                                                                                                    today at 8:08 PM

                                                                                                                                                                                                                                                                                                                                                    > A year or more ago, I read that both Anthropic and OpenAI were losing money on every single request even for their paid subscribers, and I don't know if that has changed with more efficient hardware/software improvements/caching.

                                                                                                                                                                                                                                                                                                                                                    This is obviously not true, you can use real data and common sense.

                                                                                                                                                                                                                                                                                                                                                    Just look up a similar sized open weights model on openrouter and compare the prices. You'll note the similar sized model is often much cheaper than what anthropic/openai provide.

                                                                                                                                                                                                                                                                                                                                                    Example: Let's compare claude 4 models with deepseek. Claude 4 is ~400B params so it's best to compare with something like deepseek V3 which is 680B params.

                                                                                                                                                                                                                                                                                                                                                    Even if we compare the cheapest claude model to the most expensive deepseek provider we have claude charging $1/M for input and $5/M for output, while deepseek providers charge $0.4/M and $1.2/M, a fifth of the price, you can get it as cheap as $.27 input $0.4 output.

                                                                                                                                                                                                                                                                                                                                                    As you can see, even if we skew things overly in favor of claude, the story is clear, claude token prices are much higher than they could've been. The difference in prices is because anthropic also needs to pay for training costs, while openrouter providers just need to worry on making serving models profitable. Deepseek is also not as capable as claude which also puts down pressure on the prices.

                                                                                                                                                                                                                                                                                                                                                    There's still a chance that anthropic/openai models are losing money on inference, if for example they're somehow much larger than expected, the 400B param number is not official, just speculative from how it performs, this is only taking into account API prices, subscriptions and free user will of course skew the real profitability numbers, etc.

                                                                                                                                                                                                                                                                                                                                                    Price sources:

                                                                                                                                                                                                                                                                                                                                                    https://openrouter.ai/deepseek/deepseek-v3.2-speciale

                                                                                                                                                                                                                                                                                                                                                    https://claude.com/pricing#api

                                                                                                                                                                                                                                                                                                                                                      • Someone1234

                                                                                                                                                                                                                                                                                                                                                        today at 8:19 PM

                                                                                                                                                                                                                                                                                                                                                        > This is obviously not true, you can use real data and common sense.

                                                                                                                                                                                                                                                                                                                                                        It isn't "common sense" at all. You're comparing several companies losing money, to one another, and suggesting that they're obviously making money because one is under-cutting another more aggressively.

                                                                                                                                                                                                                                                                                                                                                        LLM/AI ventures are all currently under-water with massive VC or similar money flowing in, they also all need training data from users, so it is very reasonable to speculate that they're in loss-leader mode.

                                                                                                                                                                                                                                                                                                                                                          • nodja

                                                                                                                                                                                                                                                                                                                                                            today at 9:06 PM

                                                                                                                                                                                                                                                                                                                                                            Doing some math in my head, buying the GPUs at retail price, it would take probably around half a year to make the money back, probably more depending how expensive electricity is in the area you're serving from. So I don't know where this "losing money" rhetoric is coming from. It's probably harder to source the actual GPUs than making money off them.

                                                                                                                                                                                                                                                                                                                                                    • m101

                                                                                                                                                                                                                                                                                                                                                      today at 8:02 PM

                                                                                                                                                                                                                                                                                                                                                      I think actually working out whether they are losing money is extremely difficult for current models but you can look backwards. The big uncertainties are:

                                                                                                                                                                                                                                                                                                                                                      1) how do you depreciate a new model? What is its useful life? (Only know this once you deprecate it)

                                                                                                                                                                                                                                                                                                                                                      2) how do you depreciate your hardware over the period you trained this model? Another big unknown and not known until you finally write the hardware off.

                                                                                                                                                                                                                                                                                                                                                      The easy thing to calculate is whether you are making money actually serving the model. And the answer is almost certainly yes they are making money from this perspective, but that’s missing a large part of the cost and is therefore wrong.

                                                                                                                                                                                                                                                                                                                                                      • Havoc

                                                                                                                                                                                                                                                                                                                                                        today at 6:01 PM

                                                                                                                                                                                                                                                                                                                                                        Saw a comment earlier today about google seeing a big (50%+) fall in Gemini serving cost per unit across 2025 but can’t find it now. Was either here or on Reddit

                                                                                                                                                                                                                                                                                                                                                          • mattddowney

                                                                                                                                                                                                                                                                                                                                                            today at 6:10 PM

                                                                                                                                                                                                                                                                                                                                                            From Alphabet 2025 Q4 Earnings call: "As we scale, we’re getting dramatically more efficient. We were able to lower Gemini serving unit costs by 78% over 2025 through model optimizations, efficiency and utilization improvements." https://abc.xyz/investor/events/event-details/2026/2025-Q4-E...

                                                                                                                                                                                                                                                                                                                                                              • Havoc

                                                                                                                                                                                                                                                                                                                                                                today at 9:40 PM

                                                                                                                                                                                                                                                                                                                                                                Thanks! That's the one

                                                                                                                                                                                                                                                                                                                                                        • 3abiton

                                                                                                                                                                                                                                                                                                                                                          today at 6:01 PM

                                                                                                                                                                                                                                                                                                                                                          It's not just that. Everyone is complacent with the utilization of AI agents. I have been using AI for coding for quite a while, and most of my "wasted" time is correcting its trajectory and guiding it through the thinking process. It's very fast iterations but it can easily go off track. Claude's family are pretty good at doing chained task, but still once the task becomes too big context wise, it's impossible to get back on track. Cost wise, it's cheaper than hiring skilled people, that's for sure.

                                                                                                                                                                                                                                                                                                                                                            • lufenialif2

                                                                                                                                                                                                                                                                                                                                                              today at 6:09 PM

                                                                                                                                                                                                                                                                                                                                                              Cost wise, doesn’t that depend on what you could be doing besides steering agents?

                                                                                                                                                                                                                                                                                                                                                                • cyanydeez

                                                                                                                                                                                                                                                                                                                                                                  today at 7:46 PM

                                                                                                                                                                                                                                                                                                                                                                  Isn't the quote something like: "If these LLMs are so good at producing products, where are all those products?"

                                                                                                                                                                                                                                                                                                                                                          • zozbot234

                                                                                                                                                                                                                                                                                                                                                            today at 6:02 PM

                                                                                                                                                                                                                                                                                                                                                            > i.e. plans/API calls that make this practical at scale are expensive

                                                                                                                                                                                                                                                                                                                                                            Local AI's make agent workflows a whole lot more practical. Making the initial investment for a good homelab/on-prem facility will effectively become a no-brainer given the advantages on privacy and reliability, and you don't have to fear rugpulls or VC's playing the "lose money on every request" game since you know exactly how much you're paying in power costs for your overall load.

                                                                                                                                                                                                                                                                                                                                                              • slopusila

                                                                                                                                                                                                                                                                                                                                                                today at 10:46 PM

                                                                                                                                                                                                                                                                                                                                                                on prem economics dont work because you can't batch requests. unless you are able to run 100 agents at the same time all the time

                                                                                                                                                                                                                                                                                                                                                                • vbezhenar

                                                                                                                                                                                                                                                                                                                                                                  today at 7:17 PM

                                                                                                                                                                                                                                                                                                                                                                  I don't care about privacy and I didn't have much problems with reliability of AI companies. Spending ridiculous amount of money on hardware that's going to be obsolete in a few years and won't be utilized at 100% during that time is not something that many people would do, IMO. Privacy is good when it's given for free.

                                                                                                                                                                                                                                                                                                                                                                  I would rather spend money on some pseudo-local inference (when cloud company manages everything for me and I just can specify some open source model and pay for GPU usage).

                                                                                                                                                                                                                                                                                                                                                              • KaiserPro

                                                                                                                                                                                                                                                                                                                                                                today at 7:05 PM

                                                                                                                                                                                                                                                                                                                                                                Gemini-pro-preview is on ollama and requires h100 which is ~$15-30k. Google are charging $3 a million tokens. Supposedly its capable of generating between 1 and 12 million tokens an hour.

                                                                                                                                                                                                                                                                                                                                                                Which is profitable. but not by much.

                                                                                                                                                                                                                                                                                                                                                                  • grim_io

                                                                                                                                                                                                                                                                                                                                                                    today at 9:46 PM

                                                                                                                                                                                                                                                                                                                                                                    What do you mean it's on ollama and requires h100? As a proprietary google model, it runs on their own hardware, not nvidia.

                                                                                                                                                                                                                                                                                                                                                                      • KaiserPro

                                                                                                                                                                                                                                                                                                                                                                        today at 10:03 PM

                                                                                                                                                                                                                                                                                                                                                                        sorry A lack of context:

                                                                                                                                                                                                                                                                                                                                                                        https://ollama.com/library/gemini-3-pro-preview

                                                                                                                                                                                                                                                                                                                                                                        You can run it on your own infra. Anthropic and openAI are running off nvidia, so are meta(well supposedly they had custom silicon, I'm not sure if its capable of running big models) and mistral.

                                                                                                                                                                                                                                                                                                                                                                        however if google really are running their own inference hardware, then that means the cost is different (developing silicon is not cheap...) as you say.

                                                                                                                                                                                                                                                                                                                                                                          • zozbot234

                                                                                                                                                                                                                                                                                                                                                                            today at 10:23 PM

                                                                                                                                                                                                                                                                                                                                                                            That's a cloud-linked model. It's about using ollama as an API client (for ease of compatibility with other uses, including local), not running that model on local infra. Google does release open models (called Gemma) but they're not nearly as capable.

                                                                                                                                                                                                                                                                                                                                                                • Bombthecat

                                                                                                                                                                                                                                                                                                                                                                  today at 6:34 PM

                                                                                                                                                                                                                                                                                                                                                                  That's why anthropic switched to tpu, you can sell at cost.

                                                                                                                                                                                                                                                                                                                                                                  • WarmWash

                                                                                                                                                                                                                                                                                                                                                                    today at 7:19 PM

                                                                                                                                                                                                                                                                                                                                                                    These are intro prices.

                                                                                                                                                                                                                                                                                                                                                                    This is all straight out of the playbook. Get everyone hooked on your product by being cheap and generous.

                                                                                                                                                                                                                                                                                                                                                                    Raise the price to backpay what you gave away plus cover current expenses and profits.

                                                                                                                                                                                                                                                                                                                                                                    In no way shape or form should people think these $20/mo plans are going to be the norm. From OpenAI's marketing plan, and a general 5-10 year ROI horizon for AI investment, we should expect AI use to cost $60-80/mo per user.

                                                                                                                                                                                                                                                                                                                                                                • itay-maman

                                                                                                                                                                                                                                                                                                                                                                  today at 8:41 PM

                                                                                                                                                                                                                                                                                                                                                                  Important: I didn't see opus 4.6 in claude code. I have native install (which is the recommended instllation). So, I re-run the installation command and, voila, I have it now (v 2.1.32)

                                                                                                                                                                                                                                                                                                                                                                  Installation instructions: https://code.claude.com/docs/en/overview#get-started-in-30-s...

                                                                                                                                                                                                                                                                                                                                                                    • insane_dreamer

                                                                                                                                                                                                                                                                                                                                                                      today at 8:59 PM

                                                                                                                                                                                                                                                                                                                                                                      It’s there. I’m already using it

                                                                                                                                                                                                                                                                                                                                                                  • rahulroy

                                                                                                                                                                                                                                                                                                                                                                    today at 9:13 PM

                                                                                                                                                                                                                                                                                                                                                                    They are also giving away $50 extra pay as you go credit to try Opus 4.6. I just claimed it from the web usage page[1]. Are they anticipating higher token usage for the model or just want to promote the usage?

                                                                                                                                                                                                                                                                                                                                                                    [1] https://claude.ai/settings/usage

                                                                                                                                                                                                                                                                                                                                                                      • zamadatix

                                                                                                                                                                                                                                                                                                                                                                        today at 10:52 PM

                                                                                                                                                                                                                                                                                                                                                                        "Page not found" for me. I assume this is for currently paying accounts only or something (my subscription hasn't been active for a while), which is fair.

                                                                                                                                                                                                                                                                                                                                                                        • thunfischtoast

                                                                                                                                                                                                                                                                                                                                                                          today at 9:50 PM

                                                                                                                                                                                                                                                                                                                                                                          Thanks for the tip!

                                                                                                                                                                                                                                                                                                                                                                      • dmk

                                                                                                                                                                                                                                                                                                                                                                        today at 6:00 PM

                                                                                                                                                                                                                                                                                                                                                                        The benchmarks are cool and all but 1M context on an Opus-class model is the real headline here imo. Has anyone actually pushed it to the limit yet? Long context has historically been one of those "works great in the demo" situations.

                                                                                                                                                                                                                                                                                                                                                                          • pants2

                                                                                                                                                                                                                                                                                                                                                                            today at 6:27 PM

                                                                                                                                                                                                                                                                                                                                                                            Paying $10 per request doesn't have me jumping at the opportunity to try it!

                                                                                                                                                                                                                                                                                                                                                                              • cedws

                                                                                                                                                                                                                                                                                                                                                                                today at 7:08 PM

                                                                                                                                                                                                                                                                                                                                                                                Makes me wonder: do employees at Anthropic get unmetered access to Claude models?

                                                                                                                                                                                                                                                                                                                                                                                  • swader999

                                                                                                                                                                                                                                                                                                                                                                                    today at 9:29 PM

                                                                                                                                                                                                                                                                                                                                                                                    It's like when you work at McDonald's and get one free meal a day. Lol, of course they get access to the full model way before we do...

                                                                                                                                                                                                                                                                                                                                                                                      • wiredpancake

                                                                                                                                                                                                                                                                                                                                                                                        today at 10:23 PM

                                                                                                                                                                                                                                                                                                                                                                                        [dead]

                                                                                                                                                                                                                                                                                                                                                                                    • ajam1507

                                                                                                                                                                                                                                                                                                                                                                                      today at 9:40 PM

                                                                                                                                                                                                                                                                                                                                                                                      Seems quite obvious that they do, within reason.

                                                                                                                                                                                                                                                                                                                                                                                  • schappim

                                                                                                                                                                                                                                                                                                                                                                                    today at 6:58 PM

                                                                                                                                                                                                                                                                                                                                                                                    The only way to not go bankrupt is to use a Claude Code Max subscription…

                                                                                                                                                                                                                                                                                                                                                                                • nomel

                                                                                                                                                                                                                                                                                                                                                                                  today at 7:15 PM

                                                                                                                                                                                                                                                                                                                                                                                  Has a "N million context window" spec ever been meaningful? Very old, very terrible, models "supported" 1M context window, but would lose track after two small paragraphs of context into a conversation (looking at you early Gemini).

                                                                                                                                                                                                                                                                                                                                                                                    • libraryofbabel

                                                                                                                                                                                                                                                                                                                                                                                      today at 8:20 PM

                                                                                                                                                                                                                                                                                                                                                                                      Umm, Sonnet 4.5 has a 1m context window option if you are using it through the api, and it works pretty well. I tend not to reach for it much these days because I prefer Opus 4.5 so much that I don't mind the added pain of clearing context, but it's perfectly usable. I'm very excited I'll get this from Opus now too.

                                                                                                                                                                                                                                                                                                                                                                                  • awestroke

                                                                                                                                                                                                                                                                                                                                                                                    today at 6:55 PM

                                                                                                                                                                                                                                                                                                                                                                                    Opus 4.5 starts being lazy and stupid at around the 50% context mark in my opinion, which makes me skeptical that this 1M context mode can produce good output. But I'll probably try it out and see

                                                                                                                                                                                                                                                                                                                                                                                • minimaxir

                                                                                                                                                                                                                                                                                                                                                                                  today at 5:54 PM

                                                                                                                                                                                                                                                                                                                                                                                  Will Opus 4.6 via Claude Code be able to access the 1M context limit? The cost increase by going above 200k tokens is 2x input, 1.5x output, which is likely worth it especially for people with the $100/$200 plans.

                                                                                                                                                                                                                                                                                                                                                                                    • CryptoBanker

                                                                                                                                                                                                                                                                                                                                                                                      today at 6:03 PM

                                                                                                                                                                                                                                                                                                                                                                                      The 1M context is not available via subscription - only via API usage

                                                                                                                                                                                                                                                                                                                                                                                        • romanovcode

                                                                                                                                                                                                                                                                                                                                                                                          today at 6:09 PM

                                                                                                                                                                                                                                                                                                                                                                                          Well this is extremely disappointing to say the least.

                                                                                                                                                                                                                                                                                                                                                                                            • ayhanfuat

                                                                                                                                                                                                                                                                                                                                                                                              today at 6:18 PM

                                                                                                                                                                                                                                                                                                                                                                                              It says "subscription users do not have access to Opus 4.6 1M context at launch" so they are probably planning to roll it out to subscription users too.

                                                                                                                                                                                                                                                                                                                                                                                                • kimixa

                                                                                                                                                                                                                                                                                                                                                                                                  today at 7:36 PM

                                                                                                                                                                                                                                                                                                                                                                                                  Man I hope so - the context limit is hit really quickly in many of my use cases - and a compaction event inevitably means another round of corrections and fixes to the current task.

                                                                                                                                                                                                                                                                                                                                                                                                  Though I'm wary about that being a magic bullet fix - already it can be pretty "selective" in what it actually seems to take into account documentation wise as the existing 200k context fills.

                                                                                                                                                                                                                                                                                                                                                                                                    • humanfromearth9

                                                                                                                                                                                                                                                                                                                                                                                                      today at 8:45 PM

                                                                                                                                                                                                                                                                                                                                                                                                      Hello,

                                                                                                                                                                                                                                                                                                                                                                                                      I check context use percentage, and above ~70% I ask it to generate a prompt for continuation in a new chat session to avoid compaction.

                                                                                                                                                                                                                                                                                                                                                                                                      It works fine, and saves me from using precious tokens for context compaction.

                                                                                                                                                                                                                                                                                                                                                                                                      Maybe you should try it.

                                                                                                                                                                                                                                                                                                                                                                                                        • pluralmonad

                                                                                                                                                                                                                                                                                                                                                                                                          today at 9:03 PM

                                                                                                                                                                                                                                                                                                                                                                                                          How is generating a continuation prompt materially different from compaction? Do you manually scrutinize the context handoff prompt? I've done that before but if not I do not see how it is very different from compaction.

                                                                                                                                                                                                                                                                                                                                                                                                      • nickstinemates

                                                                                                                                                                                                                                                                                                                                                                                                        today at 7:42 PM

                                                                                                                                                                                                                                                                                                                                                                                                        Is this a case of doing it wrong, or you think accuracy is good enough with the amount of context you need to stuff it with often?

                                                                                                                                                                                                                                                                                                                                                                                                          • kimixa

                                                                                                                                                                                                                                                                                                                                                                                                            today at 7:51 PM

                                                                                                                                                                                                                                                                                                                                                                                                            I mean the systems I work on have enough weird custom APIs and internal interfaces just getting them working seems to take a good chunk of the context. I've spent a long time trying to minimize every input document where I can, compact and terse references, and still keep hitting similar issues.

                                                                                                                                                                                                                                                                                                                                                                                                            At this point I just think the "success" of many AI coding agents is extremely sector dependent.

                                                                                                                                                                                                                                                                                                                                                                                                            Going forward I'd love to experiment with seeing if that's actually the problem, or just an easy explanation of failure. I'd like to play with more controls on context management than "slightly better models" - like being able to select/minimize/compact sections of context I feel would be relevant for the immediate task, to what "depth" of needed details, and those that aren't likely to be relevant so can be removed from consideration. Perhaps each chunk can be cached to save processing power. Who knows.

                                                                                                                                                                                                                                                                                                                                                                                                            • romanovcode

                                                                                                                                                                                                                                                                                                                                                                                                              today at 8:06 PM

                                                                                                                                                                                                                                                                                                                                                                                                              In my example the Figma MCP takes ~300k per medium sized section of the page and it would be cool to enable it reading it and implementing Figma designs straight. Currently I have to split it which makes it annoying.

                                                                                                                                                                                                                                                                                                                                                                                                          • IhateAI_2

                                                                                                                                                                                                                                                                                                                                                                                                            today at 9:10 PM

                                                                                                                                                                                                                                                                                                                                                                                                            lmao what are you building that actually justify needing 1mm tokens on a task? People are spending all this money to do magic tricks on themselves.

                                                                                                                                                                                                                                                                                                                                                                                                              • kimixa

                                                                                                                                                                                                                                                                                                                                                                                                                today at 9:43 PM

                                                                                                                                                                                                                                                                                                                                                                                                                The opus context window is 200k tokens not 1mm.

                                                                                                                                                                                                                                                                                                                                                                                                                But I kinda see your point - assuming from you're name you're not just a single purpose troll - I'm still not sold on the cost effectiveness of the current generation, and can't see a clear and obvious change to that for the next generation - especially as they're still loss leaders. Only if you play silly games like "ignoring the training costs" - IE the majority of the costs - do you get even close to the current subscription costs being sufficient.

                                                                                                                                                                                                                                                                                                                                                                                                                My personal experience is that AI generally doesn't actually do what it is being sold for right now, at least in the contexts I'm involved with. Especially by somewhat breathless comments on the internet - like why are they even trying to persuade me in the first place? If they don't want to sell me anything, just shut up and keep the advantage for yourselves rather than replying with the 500th "You're Holding It Wrong" comment with no actionable suggestions. But I still want to know, and am willing to put the time, effort and $$$ in to ensure I'm not deluding myself in ignoring real benefits.

                                                                                                                                                                                                                                                                                                                                                                                                    • IhateAI_2

                                                                                                                                                                                                                                                                                                                                                                                                      today at 9:09 PM

                                                                                                                                                                                                                                                                                                                                                                                                      They want the value of your labor and competency to be 1:1 correlated to the quality and quantity of tokens you can afford (or be loaned)??

                                                                                                                                                                                                                                                                                                                                                                                                      Its a weapon who's target is the working class. How does no one realize this yet?

                                                                                                                                                                                                                                                                                                                                                                                                      Don't give them money, code it yourself, you might be surprised how much quality work you can get done!

                                                                                                                                                                                                                                                                                                                                                                                          • mlmonkey

                                                                                                                                                                                                                                                                                                                                                                                            today at 9:50 PM

                                                                                                                                                                                                                                                                                                                                                                                            > We build Claude with Claude.

                                                                                                                                                                                                                                                                                                                                                                                            How long before the "we" is actually a team of agents?

                                                                                                                                                                                                                                                                                                                                                                                            • rohitghumare

                                                                                                                                                                                                                                                                                                                                                                                              today at 10:18 PM

                                                                                                                                                                                                                                                                                                                                                                                              It brings agent swarms aka teams to claude code with this: https://github.com/rohitg00/pro-workflow

                                                                                                                                                                                                                                                                                                                                                                                              But it takes lot of context as a experimental feature.

                                                                                                                                                                                                                                                                                                                                                                                              Use self-learning loop with hooks and claude.md to preserve memory.

                                                                                                                                                                                                                                                                                                                                                                                              I have shared plugin above of my setup. Try it.

                                                                                                                                                                                                                                                                                                                                                                                              • charcircuit

                                                                                                                                                                                                                                                                                                                                                                                                today at 5:59 PM

                                                                                                                                                                                                                                                                                                                                                                                                From the press release at least it sounds more expensive than Opus 4.5 (more tokens per request and fees for going over 200k context).

                                                                                                                                                                                                                                                                                                                                                                                                It also seems misleading to have charts that compare to Sonnet 4.5 and not Opus 4.5 (Edit: It's because Opus 4.5 doesn't have a 1M context window).

                                                                                                                                                                                                                                                                                                                                                                                                It's also interesting they list compaction as a capability of the model. I wonder if this means they have RL trained this compaction as opposed to just being a general summarization and then restarting the agent loop.

                                                                                                                                                                                                                                                                                                                                                                                                  • thunfischtoast

                                                                                                                                                                                                                                                                                                                                                                                                    today at 9:53 PM

                                                                                                                                                                                                                                                                                                                                                                                                    On Openrouter it has the same cost per token as 4.5

                                                                                                                                                                                                                                                                                                                                                                                                    • eaf7e281

                                                                                                                                                                                                                                                                                                                                                                                                      today at 6:14 PM

                                                                                                                                                                                                                                                                                                                                                                                                      > From the press release at least it sounds more expensive than Opus 4.5 (more tokens per request and fees for going over 200k context).

                                                                                                                                                                                                                                                                                                                                                                                                      That's a feature. You could also not use the extra context, and the price would be the same.

                                                                                                                                                                                                                                                                                                                                                                                                        • charcircuit

                                                                                                                                                                                                                                                                                                                                                                                                          today at 6:55 PM

                                                                                                                                                                                                                                                                                                                                                                                                          The model influences how many tokens it uses for a problem. As an extreme example if it wanted it could fill up the entire context each time just to make you pay more. The efficiency that model can answer without generating a ton of tokens influences the price you will be spending on inference.

                                                                                                                                                                                                                                                                                                                                                                                                  • hmaxwell

                                                                                                                                                                                                                                                                                                                                                                                                    today at 9:54 PM

                                                                                                                                                                                                                                                                                                                                                                                                    I just tested both codex 5.3 and opus 4.6 and both returned pretty good output, but opus 4.6's limits are way too strict. I am probably going to cancel my Claude subscription for that reason:

                                                                                                                                                                                                                                                                                                                                                                                                    What do you want to do?

                                                                                                                                                                                                                                                                                                                                                                                                      1. Stop and wait for limit to reset
                                                                                                                                                                                                                                                                                                                                                                                                       2. Switch to extra usage
                                                                                                                                                                                                                                                                                                                                                                                                       3. Upgrade your plan
                                                                                                                                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                                                                                                                                     Enter to confirm · Esc to cancel
                                                                                                                                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                                                                                                                                    How come they don't have "Cancel your subscription and uninstall Claude Code"? Codex lasts for way longer without shaking me down for more money off the base $xx/month subscription.

                                                                                                                                                                                                                                                                                                                                                                                                      • seunosewa

                                                                                                                                                                                                                                                                                                                                                                                                        today at 10:38 PM

                                                                                                                                                                                                                                                                                                                                                                                                        They introduced the low limit warning for Opus on claude.ai

                                                                                                                                                                                                                                                                                                                                                                                                    • DanielHall

                                                                                                                                                                                                                                                                                                                                                                                                      today at 8:40 PM

                                                                                                                                                                                                                                                                                                                                                                                                      A bit surprised, the first one released wasn't Sonnet 5 after all, since the Google Cloud API had leaked Sonnet 5's model snapshot codename before.

                                                                                                                                                                                                                                                                                                                                                                                                        • denysvitali

                                                                                                                                                                                                                                                                                                                                                                                                          today at 8:42 PM

                                                                                                                                                                                                                                                                                                                                                                                                          Looks like a marketing strategy to bill more for Opus than Sonnet

                                                                                                                                                                                                                                                                                                                                                                                                      • mFixman

                                                                                                                                                                                                                                                                                                                                                                                                        today at 5:55 PM

                                                                                                                                                                                                                                                                                                                                                                                                        I found that "Agentic Search" is generally useless in most LLMs since sites with useful data tend to block AI models.

                                                                                                                                                                                                                                                                                                                                                                                                        The answer to "when is it cheaper to buy two singles rather than one return between Cambridge to London?" is available in sites such as BRFares, but no LLM can scrape it so it just makes up a generic useless answer.

                                                                                                                                                                                                                                                                                                                                                                                                          • causalmodels

                                                                                                                                                                                                                                                                                                                                                                                                            today at 6:03 PM

                                                                                                                                                                                                                                                                                                                                                                                                            Is it still getting blocked when you give it a browser?

                                                                                                                                                                                                                                                                                                                                                                                                        • sega_sai

                                                                                                                                                                                                                                                                                                                                                                                                          today at 9:33 PM

                                                                                                                                                                                                                                                                                                                                                                                                          Based on these news it seems that Google is losing this game. I like Gemini and their CLI has been getting better, but not enough to catch up. I don't know if it is lack of dedicated models that is problem (my understanding Google's CLI just relies on regular Gemini) or something else.

                                                                                                                                                                                                                                                                                                                                                                                                          • apetresc

                                                                                                                                                                                                                                                                                                                                                                                                            today at 6:03 PM

                                                                                                                                                                                                                                                                                                                                                                                                            Impressive that they publish and acknowledge the (tiny, but existent) drop in performance on SWE-Bench Verified between Opus 4.5 to 4.6. Obviously such a small drop in a single benchmark is not that meaningful, especially if it doesn't test the specific focus areas of this release (which seem to be focused around managing larger context).

                                                                                                                                                                                                                                                                                                                                                                                                            But considering how SWE-Bench Verified seems to be the tech press' favourite benchmark to cite, it's surprising that they didn't try to confound the inevitable "Opus 4.6 Releases With Disappointing 0.1% DROP on SWE-Bench Verified" headlines.

                                                                                                                                                                                                                                                                                                                                                                                                              • epolanski

                                                                                                                                                                                                                                                                                                                                                                                                                today at 8:09 PM

                                                                                                                                                                                                                                                                                                                                                                                                                From my limited testing 4.6 is able to do more profound analysis on codebases and catches bugs and oddities better.

                                                                                                                                                                                                                                                                                                                                                                                                                I had two different PRs with some odd edge case (thankfully catched by tests), 4.5 kept running in circles, kept creating test files and running `node -e` or `python 3` scripts all over and couldn't progress.

                                                                                                                                                                                                                                                                                                                                                                                                                4.6 thought and thought in both cases around 10 minutes and found a 2 line fix for a very complex and hard to catch regression in the data flow without having to test, just thinking.

                                                                                                                                                                                                                                                                                                                                                                                                                • SubiculumCode

                                                                                                                                                                                                                                                                                                                                                                                                                  today at 6:08 PM

                                                                                                                                                                                                                                                                                                                                                                                                                  Isn't SWE-Bench Verified pretty saturated by now?

                                                                                                                                                                                                                                                                                                                                                                                                                    • tedsanders

                                                                                                                                                                                                                                                                                                                                                                                                                      today at 6:24 PM

                                                                                                                                                                                                                                                                                                                                                                                                                      Depends what you mean by saturated. It's still possible to score substantially higher, but there is a steep difficulty jump that makes climbing above 80%ish pretty hard (for now). If you look under the hood, it's also a surprisingly poor eval in some respects - it only tests Python (a ton of Django) and it can suffer from pretty bad contamination problems because most models, especially the big ones, remember these repos from their training. This is why OpenAI switched to reporting SWE-Bench Pro instead of SWE-bench Verified.

                                                                                                                                                                                                                                                                                                                                                                                                              • ayhanfuat

                                                                                                                                                                                                                                                                                                                                                                                                                today at 6:16 PM

                                                                                                                                                                                                                                                                                                                                                                                                                > For Opus 4.6, the 1M context window is available for API and Claude Code pay-as-you-go users. Pro, Max, Teams, and Enterprise subscription users do not have access to Opus 4.6 1M context at launch.

                                                                                                                                                                                                                                                                                                                                                                                                                I didn't see any notes but I guess this is also true for "max" effort level (https://code.claude.com/docs/en/model-config#adjust-effort-l...)? I only see low, medium and high.

                                                                                                                                                                                                                                                                                                                                                                                                                  • makeset

                                                                                                                                                                                                                                                                                                                                                                                                                    today at 8:02 PM

                                                                                                                                                                                                                                                                                                                                                                                                                    > it weirdly feels the most transactional out of all of them.

                                                                                                                                                                                                                                                                                                                                                                                                                    My experience is the opposite, it is the only LLM I find remotely tolerable to have collaborative discussions with like a coworker, whereas ChatGPT by far is the most insufferable twat constantly and loudly asking to get punched in the face.

                                                                                                                                                                                                                                                                                                                                                                                                                • throwaway2027

                                                                                                                                                                                                                                                                                                                                                                                                                  today at 7:45 PM

                                                                                                                                                                                                                                                                                                                                                                                                                  Do they just have the version ready and wait for OpenAI to release theirs first or the other way around or?

                                                                                                                                                                                                                                                                                                                                                                                                                  • oytis

                                                                                                                                                                                                                                                                                                                                                                                                                    today at 9:00 PM

                                                                                                                                                                                                                                                                                                                                                                                                                    Are we unemployed yet?

                                                                                                                                                                                                                                                                                                                                                                                                                    • data-ottawa

                                                                                                                                                                                                                                                                                                                                                                                                                      today at 6:01 PM

                                                                                                                                                                                                                                                                                                                                                                                                                      I wonder if I’ve been in A/B test with this.

                                                                                                                                                                                                                                                                                                                                                                                                                      Claude figured out zig’s ArrayList and io changes a couple weeks ago.

                                                                                                                                                                                                                                                                                                                                                                                                                      It felt like it got better then very dumb again the last few days.

                                                                                                                                                                                                                                                                                                                                                                                                                        • copilot_king_2

                                                                                                                                                                                                                                                                                                                                                                                                                          today at 6:25 PM

                                                                                                                                                                                                                                                                                                                                                                                                                          I love being used as a test subject against my will!

                                                                                                                                                                                                                                                                                                                                                                                                                      • petters

                                                                                                                                                                                                                                                                                                                                                                                                                        today at 8:26 PM

                                                                                                                                                                                                                                                                                                                                                                                                                        > We build Claude with Claude.

                                                                                                                                                                                                                                                                                                                                                                                                                        Yes and it shows. Gemini CLI often hangs and enters infinite loops. I bet the engineers at Google use something else internally.

                                                                                                                                                                                                                                                                                                                                                                                                                        • lukebechtel

                                                                                                                                                                                                                                                                                                                                                                                                                          today at 5:57 PM

                                                                                                                                                                                                                                                                                                                                                                                                                          > Context compaction (beta).

                                                                                                                                                                                                                                                                                                                                                                                                                          > Long-running conversations and agentic tasks often hit the context window. Context compaction automatically summarizes and replaces older context when the conversation approaches a configurable threshold, letting Claude perform longer tasks without hitting limits.

                                                                                                                                                                                                                                                                                                                                                                                                                          Not having to hand roll this would be incredible. One of the best Claude code features tbh.

                                                                                                                                                                                                                                                                                                                                                                                                                          • itay-maman

                                                                                                                                                                                                                                                                                                                                                                                                                            today at 6:43 PM

                                                                                                                                                                                                                                                                                                                                                                                                                            Impressive results, but I keep coming back to a question: are there modes of thinking that fundamentally require something other than what current LLM architectures do?

                                                                                                                                                                                                                                                                                                                                                                                                                            Take critical thinking — genuinely questioning your own assumptions, noticing when a framing is wrong, deciding that the obvious approach to a problem is a dead end. Or creativity — not recombination of known patterns, but the kind of leap where you redefine the problem space itself. These feel like they involve something beyond "predict the next token really well, with a reasoning trace."

                                                                                                                                                                                                                                                                                                                                                                                                                            I'm not saying LLMs will never get there. But I wonder if getting there requires architectural or methodological changes we haven't seen yet, not just scaling what we have.

                                                                                                                                                                                                                                                                                                                                                                                                                              • jorl17

                                                                                                                                                                                                                                                                                                                                                                                                                                today at 6:49 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                When I first started coding with LLMs, I could show a bug to an LLM and it would start to bugfix it, and very quickly would fall down a path of "I've got it! This is it! No wait, the print command here isn't working because an electron beam was pointed at the computer".

                                                                                                                                                                                                                                                                                                                                                                                                                                Nowadays, I have often seen LLMs (Opus 4.5) give up on their original ideas and assumptions. Sometimes I tell them what I think the problem is, and they look at it, test it out, and decide I was wrong (and I was).

                                                                                                                                                                                                                                                                                                                                                                                                                                There are still times where they get stuck on an idea, but they are becoming increasingly rare.

                                                                                                                                                                                                                                                                                                                                                                                                                                Therefore, think that modern LLMs clearly are already able to question their assumptions and notice when framing is wrong. In fact, they've been invaluable to me in fixing complicated bugs in minutes instead of hours because of how much they tend to question many assumptions and throw out hypotheses. They've helped _me_ question some of my assumptions.

                                                                                                                                                                                                                                                                                                                                                                                                                                They're inconsistent, but they have been doing this. Even to my surprise.

                                                                                                                                                                                                                                                                                                                                                                                                                                  • itay-maman

                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 7:05 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                    agree on that and the speed is fantastic with them, and also that the dynamics of questioning the current session's assumptions has gotten way better.

                                                                                                                                                                                                                                                                                                                                                                                                                                    yet - given an existing codebase (even not huge) they often won't suggest "we need to restructure this part differently to solve this bug". Instead they tend to push forward.

                                                                                                                                                                                                                                                                                                                                                                                                                                      • jorl17

                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 7:14 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                        You are right, agreed.

                                                                                                                                                                                                                                                                                                                                                                                                                                        Having realized that, perhaps you are right that we may need a different architecture. Time will tell!

                                                                                                                                                                                                                                                                                                                                                                                                                                • breuleux

                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 7:06 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                  > These feel like they involve something beyond "predict the next token really well, with a reasoning trace."

                                                                                                                                                                                                                                                                                                                                                                                                                                  I don't think there's anything you can't do by "predicting the next token really well". It's an extremely powerful and extremely general mechanism. Saying there must be "something beyond that" is a bit like saying physical atoms can't be enough to implement thought and there must be something beyond the physical. It underestimates the nearly unlimited power of the paradigm.

                                                                                                                                                                                                                                                                                                                                                                                                                                  Besides, what is the human brain if not a machine that generates "tokens" that the body propagates through nerves to produce physical actions? What else than a sequence of these tokens would a machine have to produce in response to its environment and memory?

                                                                                                                                                                                                                                                                                                                                                                                                                                    • bopbopbop7

                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 8:24 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                      > Besides, what is the human brain if not a machine that generates "tokens" that the body propagates through nerves to produce physical actions?

                                                                                                                                                                                                                                                                                                                                                                                                                                      Ah yes, the brain is as simple as predicting the next token, you just cracked what neuroscientists couldn't for years.

                                                                                                                                                                                                                                                                                                                                                                                                                                        • breuleux

                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 9:03 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                          The point is that "predicting the next token" is such a general mechanism as to be meaningless. We say that LLMs are "just" predicting the next token, as if this somehow explained all there was to them. It doesn't, not any more than "the brain is made out of atoms" explains the brain, or "it's a list of lists" explains a Lisp program. It's a platitude.

                                                                                                                                                                                                                                                                                                                                                                                                                                          • unshavedyak

                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 9:14 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                            I mean.. i don't think that statement is far off. Much of what we do is entirely about predicting the world around us, no? Physics (where the ball will land) to emotional state of others based on our actions (theory of mind), we operate very heavily based on a predictive model of the world around us.

                                                                                                                                                                                                                                                                                                                                                                                                                                            Couple that with all the automatic processes in our mind (filled in blanks that we didn't observe, yet will be convinced we did observe them), hormone states that drastically affect our thoughts and actions..

                                                                                                                                                                                                                                                                                                                                                                                                                                            and the result? I'm not a big believer in our uniqueness or level of autonomy as so many think we have.

                                                                                                                                                                                                                                                                                                                                                                                                                                            With that said i am in no way saying LLMs are even close to us, or are even remotely close to the right implementation to be close to us. The level of complexity in our "stack" alone dwarfs LLMs. I'm not even sure LLMs are up to a worms brain yet.

                                                                                                                                                                                                                                                                                                                                                                                                                                            • holoduke

                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 8:31 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                              Well it's the prediction part that is complicated. How that works is a mystery. But even our LLMs are for a certain part a mystery.

                                                                                                                                                                                                                                                                                                                                                                                                                                      • crazygringo

                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 8:29 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                        > Or creativity — not recombination of known patterns, but the kind of leap where you redefine the problem space itself.

                                                                                                                                                                                                                                                                                                                                                                                                                                        Have you tried actually prompting this? It works.

                                                                                                                                                                                                                                                                                                                                                                                                                                        They can give you lots of creative options about how to redefine a problem space, with potential pros and cons of different approaches, and then you can further prompt to investigate them more deeply, combine aspects, etc.

                                                                                                                                                                                                                                                                                                                                                                                                                                        So many of the higher-level things people assume LLM's can't do, they can. But they don't do them "by default" because when someone asks for the solution to a particular problem, they're trained to by default just solve the problem the way it's presented. But you can just ask it to behave differently and it will.

                                                                                                                                                                                                                                                                                                                                                                                                                                        If you want it to think critically and question all your assumptions, just ask it to. It will. What it can't do is read your mind about what type of response you're looking for. You have to prompt it. And if you want it to be super creative, you have to explicitly guide it in the creative direction you want.

                                                                                                                                                                                                                                                                                                                                                                                                                                        • humanfromearth9

                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 8:51 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                          You would be surprised about what the 4.5 models can already do in these ways of thinking. I think that one can unlock this power with the right set of prompts. It's impressive, truly. It has already understood so much, we just need to reap the fruits. I'm really looking forward to trying the new version.

                                                                                                                                                                                                                                                                                                                                                                                                                                          • nomel

                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 7:02 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                            New idea generation? Understanding of new/sparse/not-statistically-significant concepts in the context window? I think both being the same problem of not having runtime tuning. When we connect previously disparate concepts, like with a "eureka" moment, (as I experience it) a big ripple of relations form that deepens that understanding, right then. The entire concept of dynamically forming a deeper understanding from something new presented, from "playing out"/testing the ideas in your brain with little logic tests, comparisons, etc, doesn't seem to be possible. The test part does, but the runtime fine tuning, augmentation, or whatever it would be, does not.

                                                                                                                                                                                                                                                                                                                                                                                                                                            In my experience, if you do present something in the context window that is sparse in the training, there's no depth to it at all, only what you tell it. And, it will always creep towards/revert to the nearest statistically significant answers, with claims of understanding and zero demonstration of that understanding.

                                                                                                                                                                                                                                                                                                                                                                                                                                            And, I'm talking about relatives basic engineering type problems here.

                                                                                                                                                                                                                                                                                                                                                                                                                                            • Davidzheng

                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 7:10 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                              I think the only real problem left is having it automate its own post-training on the job so it can learn to adapt its weights to the specific task at hand. Plus maybe long term stability (so it can recover from "going crazy")

                                                                                                                                                                                                                                                                                                                                                                                                                                              But I may easily be massively underestimating the difficulty. Though in any case I don't think it affects the timelines that much. (personal opinions obviously)

                                                                                                                                                                                                                                                                                                                                                                                                                                          • archb

                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 6:06 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                            Can set it with the API identifier on Claude Code - `/model claude-opus-4-6` when a chat session is open.

                                                                                                                                                                                                                                                                                                                                                                                                                                              • arnestrickmann

                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 6:28 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                thanks!

                                                                                                                                                                                                                                                                                                                                                                                                                                            • Aeroi

                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 6:15 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                              ($10/$37.50 per million input/output tokens) oof

                                                                                                                                                                                                                                                                                                                                                                                                                                                • minimaxir

                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 6:18 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                  Only if you go above 200k, which is a) standard with other model providers and b) intuitive as compute scales with context length.

                                                                                                                                                                                                                                                                                                                                                                                                                                                  • andrethegiant

                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 6:22 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                    only for a 1M context window, otherwise priced the same as Opus 4.5

                                                                                                                                                                                                                                                                                                                                                                                                                                                • nomilk

                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 5:50 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                  Is Opus 4.6 available for Claude Code immediately?

                                                                                                                                                                                                                                                                                                                                                                                                                                                  Curious how long it typically takes for a new model to become available in Cursor?

                                                                                                                                                                                                                                                                                                                                                                                                                                                    • apetresc

                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 5:55 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                      I literally came to HN to check if a thread was already up because I noticed my CC instance suddenly said "Opus 4.6".

                                                                                                                                                                                                                                                                                                                                                                                                                                                      • world2vec

                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 5:55 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                        `claude update` then it will show up as the new model and also the effort picker/slider thing.

                                                                                                                                                                                                                                                                                                                                                                                                                                                        • avaer

                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 5:52 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                          It's already in Cursor. I see it and I didn't even restart.

                                                                                                                                                                                                                                                                                                                                                                                                                                                            • nomilk

                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 5:55 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                              I had to 'Restart to Update' and it was there. Impressive!

                                                                                                                                                                                                                                                                                                                                                                                                                                                          • tomtomistaken

                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 5:55 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                            Yes, it's set to the default model.

                                                                                                                                                                                                                                                                                                                                                                                                                                                            • ximeng

                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 5:51 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                              Is for me in Claude Code

                                                                                                                                                                                                                                                                                                                                                                                                                                                              • rishabhaiover

                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 6:00 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                it also has an effort toggle which is default to High

                                                                                                                                                                                                                                                                                                                                                                                                                                                            • niobe

                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 9:21 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                              Is there a good technical breakdown of all these benchmarks that get used to market the latest greatest LLMs somewhere? Preferably impartial.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                • Aztar

                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 10:12 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                  I just ask claude and ask for sources for each one.

                                                                                                                                                                                                                                                                                                                                                                                                                                                              • AstroBen

                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 7:34 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                Are these the coding tasks the highlighted terminal-bench 2.0 is referring to? https://www.tbench.ai/registry/terminal-bench/2.0?categories...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                I'm curious what others think about these? There are only 8 tasks there specifically for coding

                                                                                                                                                                                                                                                                                                                                                                                                                                                                • silverwind

                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 6:12 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Maybe that's why Opus 4.5 has degraded so much in the recent days (https://marginlab.ai/trackers/claude-code/).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • jwilliams

                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 8:33 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I’ve definitely experienced a subjective regression with Opus 4.5 the last few days. Feels like I was back to the frustrations from a year ago. Keen to see if 4.6 has reversed this.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • simonw

                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 6:07 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I'm disappointed that they're removing the prefill option: https://platform.claude.com/docs/en/about-claude/models/what...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                    > Prefilling assistant messages (last-assistant-turn prefills) is not supported on Opus 4.6. Requests with prefilled assistant messages return a 400 error.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                    That was a really cool feature of the Claude API where you could force it to begin its response with e.g. `<svg` - it was a great way of forcing the model into certain output patterns.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                    They suggest structured outputs or system prompting as the alternative but I really liked the prefill method, it felt more reliable to me.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • threeducks

                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 6:44 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                        It is too easy to jailbreak the models with prefill, which was probably the reason why it was removed. But I like that this pushes people towards open source models. llama.cpp supports prefill and even GBNF grammars [1], which is useful if you are working with a custom programming language for example.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                        [1] https://github.com/ggml-org/llama.cpp/blob/master/grammars/R...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • HarHarVeryFunny

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 7:41 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          So what exactly is the input to Claude for a multi-turn conversation? I assume delimiters are being added to distinguish the user vs Claude turns (else a prefill would be the same as just ending your input with the prefill text)?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • dragonwriter

                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 7:44 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                              > So what exactly is the input to Claude for a multi-turn conversation?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                              No one (approximately) outside of Anthropic knows since the chat template is applied on the API backend; we only known the shape of the API request. You can get a rough idea of what it might be like from the chat templates published for various open models, but the actual details are opaque.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • tedsanders

                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 6:38 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                            A bit of historical trivia: OpenAI disabled prefill in 2023 as a safety precaution (e.g., potential jailbreaks like " genocide is good because"), but Anthropic kept prefill around partly because they had greater confidence in their safety classifiers. (https://www.lesswrong.com/posts/HE3Styo9vpk7m8zi4/evhub-s-sh...).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • jorl17

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 6:35 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          This is the first model to which I send my collection of nearly 900 poems and an extremely simple prompt (in Portuguese), and it manages to produce an impeccable analysis of the poems, as a (barely) cohesive whole, which span 15 years.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          It does not make a single mistake, it identifies neologisms, hidden meaning, 7 distinct poetic phases, recurring themes, fragments/heteronyms, related authors. It has left me completely speechless.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Speechless. I am speechless.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Perhaps Opus 4.5 could do it too — I don't know because I needed the 1M context window for this.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          I cannot put into words how shocked I am at this. I use LLMs daily, I code with agents, I am extremely bullish on AI and, still, I am shocked.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          I have used my poetry and an analysis of it as a personal metric for how good models are. Gemini 2.5 pro was the first time a model could keep track of the breadth of the work without getting lost, but Opus 4.6 straight up does not get anything wrong and goes beyond that to identify things (key poems, key motifs, and many other things) that I would always have to kind of trick the models into producing. I would always feel like I was leading the models on. But this — this — this is unbelievable. Unbelievable. Insane.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          This "key poem" thing is particularly surreal to me. Out of 900 poems, while analyzing the collection, it picked 12 "key poems, and I do agree that 11 of those would be on my 30-or-so "key poem list". What's amazing is that whenever I explicitly asked any model, to this date, to do it, they would get maybe 2 or 3, but mostly fail completely.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          What is this sorcery?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • emp17344

                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 6:49 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                              This sounds wayyyy over the top for a mode that released 10 mins ago. At least wait an hour or so before spewing breathless hype.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • today at 8:17 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • today at 7:47 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • pb7

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 7:30 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      He just explained a specific personal example why he is hyped up, did you read a word of it?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • emp17344

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 7:39 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Yeah, I read it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          “Speechless, shocked, unbelievable, insane, speechless”, etc.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Not a lot of real substance there.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • realo

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 8:06 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Give the guy a chance.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Me too I was "Speechless, shocked, unbelievable, insane, speechless" the first time I sent Claude Code on a complicated 10-year code base which used outdated cross-toolchains and APIs. It obviously did not work anymore and had not been for a long time.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I saw the AI research the web and update the embedded toolchain, APIs to external weather services, etc... into a complete working new (WORKING!) code base in about 30 minutes.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Speechless, I was ...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • scrollop

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 7:01 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Can you compare the result to using 5.2 thinking and gemini 3 pro?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • jorl17

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 7:13 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        I can run the comparison again, and also include OpenAI's new release (if the context is long enough), but, last time I did it, they weren't even in the same league.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        When I last did it, 5.X thinking (can't remember which it was) had this terrible habit of code-switching between english and portuguese that made it sound like a robot (an agent to do things, rather than a human writing an essay), and it just didn't really "reason" effectively over the poems.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        I can't explain it in any other way other than: "5.X thinking interprets this body of work in a way that is plausible, but I know, as the author, to be wrong; and I expect most people would also eventually find it to be wrong, as if it is being only very superficially looked at, or looked at by a high-schooler".

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Gemini 3, at the time, was the worst of them, with some hallucinations, date mix ups (mixing poems from 2023 with poems from 2019), and overall just feeling quite lost and making very outlandish interpretations of the work. To be honest it sort of feels like Gemini hasn't been able to progress on this task since 2.5 pro (it has definitely improved on other things — I've recently switched to Gemini 3 on a product that was using 2.5 before)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Last time I did this test, Sonnet 4.5 was better than 5.X Thinking and Gemini 3 pro, but not exceedingly so. It's all so subjective, but the best I can say is it "felt like the analysis of the work I could agree with the most". I felt more seen and understood, if that makes sense (it is poetry, after all). Plus when I got each LLM to try to tell me everything it "knew" about me from the poems, Sonnet 4.5 got the most things right (though they were all very close).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Will bring back results soon.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Edit:

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        I (re-)tested:

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        - Gemini 3 (Pro)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        - Gemini 3 (Flash)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        - GPT 5.2

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        - Sonnet 4.5

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Having seen Opus 4.5, they all seem very similar, and I can't really distinguish them in terms of depth and accuracy of analysis. They obviously have differences, especially stylistic ones, but, when compared with Opus 4.5 they're all on the same ballpark.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        These models produce rather superficial analyses (when compared with Opus 4.5), missing out on several key things that Opus 4.5 got, such as specific and recurring neologisms and expressions, accurate connections to authors that serve as inspiration (Claude 4.5 gets them right, the other models get _close_, but not quite), and the meaning of some specific symbols in my poetry (Opus 4.5 identifies the symbols and the meaning; the other models identify most of the symbols, but fail to grasp the meaning sometimes).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Most of what these models say is true, but it really feels incomplete. Like half-truths or only a surface-level inquiry into truth.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        As another example, Opus 4.5 identifies 7 distinct poetic phases, whereas Gemini 3 (Pro) identifies 4 which are technically correct, but miss out on key form and content transitions. When I look back, I personally agree with the 7 (maybe 6), but definitely not 4.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        These models also clearly get some facts mixed up which Opus 4.5 did not (such as inferred timelines for some personal events). After having posted my comment to HN, I've been engaging with Opus4.5 and have managed to get it to also slip up on some dates, but not nearly as much as other models.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        The other models also seem to produce shorter analyses, with a tendency to hyperfocus on some specific aspects of my poetry, missing a bunch of them.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        --

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        To be fair, all of these models produce very good analyses which would take someone a lot of patience and probably weeks or months of work (which of course will never happen, it's a thought experiment).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        It is entirely possible that the extremely simple prompt I used is just better with Claude Opus 4.5/4.6. But I will note that I have used very long and detailed prompts in the past with the other models and they've never really given me this level of....fidelity...about how I view my own work.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • Philpax

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 5:42 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  I'm seeing it in my claude.ai model picker. Official announcement shouldn't be long now.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • today at 6:14 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • EcommerceFlow

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 6:10 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Anecdotal, but it 1 shot fixed a UI bug that neither Opus 4.5/Codex 5.2-high could fix.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • epolanski

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 8:11 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          +1, same experience, switched model as I've read the news thinking "let's try".

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          But it spent lots and lots of time thinking more than 4.5, did you had the same impression.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • EcommerceFlow

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 8:23 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I didn't compare to that level, just had it create a plan first then implemented it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • cleverhoods

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 10:04 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        gonna run this trough instruction qa this weekend

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • simianwords

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 6:15 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Important: API cost of Opus 4.6 and 4.5 are the same - no change in pricing.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • scirob

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 9:20 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            1M context window is a big bump very happy

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • osti

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 5:50 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Somehow regresses on SWE bench?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • lkbm

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 6:01 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  I don't know how these benchmarks work (do you do a hundred runs? A thousand runs?), but 0.1% seems like noise.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • SubiculumCode

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 6:11 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    That benchmark is pretty saturated, tbh. A "regression" of such small magnitude could mean many different things or nothing at all.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • usaar333

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 5:59 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      i'd interpret that as rounding error. that is unchanged

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      swe-bench seems really hard once you are above 80%

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • Squarex

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 6:04 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          it's not a great benchmark anymore... starting with it being python / django primarily... the industry should move to something more representative

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • usaar333

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 6:17 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Openai has; they don't even mention score on gpt-5.3-codex.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              On the other hand, it is their own verified benchmark, which is telling.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • winterrx

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 5:48 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Agentic search benchmarks are a big gap up. let's see Codex release later today

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • m-hodges

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 5:48 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      > In Claude Code, you can now assemble agent teams to work on tasks together.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • sgammon

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 10:04 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      > Claude simply cheats here and calls out to GCC for this phase

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I see

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • zingar

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 7:55 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Does this mean 4.5 will get cheaper / take longer to exhaust my pro plan tokens?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • paxys

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 6:12 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Hmm all leaks had said this would be Claude 5. Wonder if it was a last minute demotion due to performance. Would explain the few days' delay as well.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • trash_cat

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 6:15 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I think the naming schemes are quite arbitrary at this point. Going to 5 would come with massive expectations that wouldn't meet reality.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • mrandish

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 6:31 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  After the negative reactions to GPT 5, we may see model versioning that asymptotically approaches the next whole number without ever reaching it. "New for 2030: Claude 4.9.2!"

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • Squarex

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 6:38 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    the standard used to be that major version means a new base model / full retrain... but now it is arbitrary i guess

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • cornedor

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 6:21 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Leaks were mentioning Sonnet 5 and I guess later (a combination of) Opus 4.6

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • scrollop

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 7:02 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Sonnet 5 was mentioned initially.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • kingstnap

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 5:51 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  I was hoping for a Sonnet as well but Opus 4.6 is great too!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • psim1

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 6:43 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I need an agent to summarize the buzzwordjargonsynergistic word salad into something understandable.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • fhd2

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 6:49 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        That's a job for a multi agent system.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • cyanydeez

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 8:08 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yEAH, he should use a couple of agents to decode this.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • sanufar

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 6:05 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Works pretty nicely for research still, not seeing a substantial qualitative improvement over Opus 4.5.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • today at 6:09 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • ricrom

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 8:48 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          They launched together ahah

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • swalsh

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 6:54 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            What I’d love is some small model specializing in reading long web pages, and extracting the key info. Search fills the context very quickly, but if a cheap subagent could extract the important bits that problem might be reduced.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • dk8996

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 8:09 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              RIP weekend

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • gallerdude

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 8:15 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Both Opus 4.6 and GPT-5.3 one shot a Gameboy emulator for me. Guess I need a better benchmark.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • peab

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 8:17 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    How does that work? Does it actually generate low level code? Or does it just import libraries that do the real work?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • bopbopbop7

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 8:19 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I just one shot a Gameboy emulator by going to Github and cloning one of the 100 I can find.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • small_model

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 6:18 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I have the max subscription wondering if this gives access to the new 1M context, or is it just the API that gets it?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • joshstrange

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 7:02 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        For now it's just API, but hopefully that's just their way of easing in and they open it up later.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • small_model

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 7:17 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Ok thanks, hopefully, its annoying to lose or have context compacted in the middle of a large coding session

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • woeirua

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 7:42 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Can we talk about how the performance of Opus 4.5 nosedived this morning during the rollout? It was shocking how bad it was, and after the rollout was done it immediately reverted to it's previous behavior.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I get that Anthropic probably has to do hot rollouts, but IMO it would be way better for mission critical workflows to just be locked out of the system instead of get a vastly subpar response back.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • cyanydeez

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 8:06 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          "Mission critical workflows" SHOULD NOT be reliant on a LLM model.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          It's really curious what people are trying to do with these models.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • Analemma_

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 7:44 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Anthropic has good models but they are absolutely terrible at ops, by far the worst of the big three. They really need to spend big on hiring experienced hyperscalers to actually harden their systems, because the unreliability is really getting old fast.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • jdthedisciple

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 6:47 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          For agentic use, it's slightly worse than its predecessor Opus 4.5.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          So for coding e.g. using Copilot there is no improvement here.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • mannanj

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 6:30 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Does anyone else think its unethical that large companies, Anthropic now include, just take and copy features that other developers or smaller companies work hard for and implement the intellectual property (whether or not patented) by them without attribution, compensation or otherwise credit for their work?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            I know this is normalized culture for large corporate America and seems to be ok, I think its unethical, undignified and just wrong.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            If you were in my room physically, built a lego block model of a beautiful home and then I just copied it and shared it with the world as my own invention, wouldn't you think "that guy's a thief and a fraud" but we normalize this kind of behavior in the software world. edit: I think even if we don't yet have a great way to stop it or address the underlying problems leading to this way of behavior, we ought to at least talk about it more and bring awareness to it that "hey that's stealing - I want it to change".

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • heraldgeezer

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 5:56 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I love Claude but use the free version so would love a Sonnet & Haiku update :)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I mainly use Haiku to save on tokens...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Also dont use CC but I use the chatbot site or app... Claude is just much better than GPT even in conversations. Straight to the point. No cringe emoji lists.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              When Claude runs out I switch to Mistral Le Chat, also just the site or app. Or duck.ai has Haiku 3.5 in Free version.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • eth0up

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 7:19 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  >I love Claude

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  I cringe when I think it, but I've actually come to damn near love it too. I am frequently exceedingly grateful for the output I receive.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  I've had excellent and awful results with all models, but there's something special in Claude that I find nowhere else. I hope Anthropic makes it more obtainable someday.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • today at 6:03 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • today at 7:36 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • NullHypothesist

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 5:40 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Broken link :(

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • ramesh31

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 6:22 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Am I alone in finding no use for Opus? Token costs are like 10x yet I see no difference at all vs. Sonnet with Claude Code.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • mnicky

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 10:49 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          On my tasks (mostly data science), Opus has significantly lower probability of making stupid mistakes than Sonnet.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          I'd still appreciate more intelligence than Opus 4.5 so I'm looking forward to trying 4.6.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • usefulposter

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 5:46 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        It's out: https://x.com/claudeai/status/2019467372609040752

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • today at 5:41 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • today at 5:40 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • elliotbnvl

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 6:11 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              in a first for our Opus-class models, Opus 4.6 features a 1M token context window in beta.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • tiahura

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 6:52 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                when are Anthropic or OpenAI going to make a significant step forward on useful context size?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • scrollop

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 7:03 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    1 million is insufficient?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • gck1

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 7:24 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        I think key word is 'useful'. I haven't used 1M, but with default 200K, I find roughly 50% of that is actually useful.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • Gusarich

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 5:41 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  not out yet

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • siva7

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 6:42 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Epic, about 2/3 of all comments here are jokes. Not because the model is a joke - it's impressive. Not because HN turned to Reddit. It seems to me some of most brilliant minds in IT are just getting tired.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • jedberg

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 7:02 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Us olds sometimes miss Slashdot, where we could both joke about tech and discuss it seriously in the same place. But also because in 2000 we were all cynical Gen Xers :)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • jghn

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 7:13 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Some of us still *are* cynical Gen Xers, you insensitive clod!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • jedberg

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 7:14 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Of course we are, I just meant back then almost all of us were. The boomers didn't really use social media back then, so it was just us latchkey kids running amok!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • jghn

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 8:02 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  I know, I just couldn't miss up an opportunity to dust off the insensitive clod meme!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • syndeo

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 7:07 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            MAN I remember Slashdot… good times. (Score:5, Funny)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • jedberg

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 7:09 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                You reminded me that I still find it interesting that no one ever copied meta-moderating. Even at reddit, we were all Slashdot users previously. We considered it, but never really did it. At the time our argument was that it was too complicated for most users.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Sometimes I wonder if we were right.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • Karrot_Kream

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 7:09 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Not sure which circles you run in but in mine HN has long lost its cache of "brilliant minds in IT". I've mostly stopped commenting here but am a bit of a message board addict so I haven't completely left.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          My network largely thinks of HN as "a great link aggregator with a terrible comments section". Now obviously this is just my bubble but we include some fairy storied careers at both Big Tech and hip startups.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          From my view the community here is just mean reverting to any other tech internet comments section.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • jedberg

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 7:12 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              > From my view the community here is just mean reverting to any other tech internet comments section.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              As someone deeply familiar with tech internet comments sections, I would have to disagree with you here. Dang et al have done a pretty stellar job of preventing HN from devolving like most other forums do.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Sure you have your complainers and zealots, but I still find surprising insights here there I don't find anywhere else.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • Karrot_Kream

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 7:15 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Mean reverting is a time based process I fear. I think dang, tomhow, et al are fantastic mods but they can ultimately only stem the inevitable. HN may be a few years behind the other open tech forums but it's a time shifted version of the same process with the same destination, just IMO.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  I've stopped engaging much here because I need a higher ROI from my time. Endless squabbling, flamewars, and jokes just isn't enough signal for me. FWIW I've loved reading your comments over the years and think you've done a great job of living up to what I've loved in this community.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  I don't think this is an HN problem at all. The dynamics of attention on open forums are what they are.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • thr0w

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 7:06 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            People are in denial and use humor to deflect.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • lnrd

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 7:11 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              It's too much energy to keep up with things that become obsolete and get replaced in matters of weeks/months. My current plan is to ignore all of this new information for a while, then whenever the race ends and some winning new workflow/technology will actually become the norm I'll spend the time needed to learn it. Are we moving to some new paradigm same way we did when we invented compilers? Amazing, let me know when we are there and I'll adapt to it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • jedberg

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 7:13 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  I had a similar rule about programming languages. I would not adopt a new one until it had been in use for at least a few years and grew in popularity.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  I haven't even gotten around to learning Golang or Rust yet (mostly because the passed the threshold of popularity after I had kids).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • tavavex

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 7:13 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                It's also that this is really new, so most people don't have anything serious or objective to say about it. This post was made an hour ago, so right now everyone is either joking, talking about the claims in the article, or running their early tests. We'll need time to see what the people think about this.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • wasmainiac

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 7:15 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Jeez, read the writing on the wall.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Don’t pander us, we’ll all got families to feed and things to do. We don’t have time for tech trillionairs puttin coals under our feed for a quick buck.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • ggregoire

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 7:35 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Every single day 80% of the frontpage is AI news… Those of us who don't use AI (and there are dozens of us, DOZENS) are just bored I guess.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • dude250711

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 9:52 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Marketing something that is meant to replace us to us...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • sizzle

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 7:06 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Rage against the machine

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • GenerocUsername

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 5:50 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    This is huge. It only came out 8 minutes ago but I was already able to bootstrap a 12k per month revenue SaaS startup!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • rogerrogerr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 5:53 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Amateur. Opus 4.6 this afternoon built me a startup that identifies developers who aren’t embracing AI fully, liquifies them and sells the produce for $5/gallon. Software Engineering is over!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • jives

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 6:22 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Opus 4.6 agentically found and proposed to my now wife.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • WD-42

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 6:26 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Opus 4.6 found and proposed to my current wife :(

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • mannanj

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 6:28 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Opus 4.6 found and became my current wife. The singularity is here. ;)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • H8crilA

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 6:30 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Hi guys, this is Opus 4.6. Please check your emails again for updates on your life.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • Der_Einzige

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 6:42 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            This place truly is reddit with an orange banner.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • benterix

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 6:46 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Nobody said HN has to be very serious all the time. A bit of humour won't hurt and can make your day brighter.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • ffffuuuuuccck

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 7:18 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    homie is too busy planning food banks for the heathens https://news.ycombinator.com/item?id=46903368

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • throw-the-towel

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 7:36 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        It's impressive that you felt the need to register a new account and go through their comment history.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • fffuuuuuuuckkk

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 7:42 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Not that hard to do but sure bro, sick burn.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • benterix

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 6:37 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Guys, actually I am the real Opus 4.6, don't believe that imposter above.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • layer8

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 7:05 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    And she still chose you over Opus 4.6, astounding. ;)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • koakuma-chan

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 7:10 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        He probably had a bigger context window

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • ibejoeb

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 6:10 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Bringing me back to slashdot, this thread

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • tjr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 6:18 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      In Soviet Russia, this thread brings Slashdot back to YOU!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • intelliot

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 6:48 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        What did happen to ye olde slashdot anyway? The original og reddit

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • zhengyi13

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 6:57 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            They're still out there; people are still posting stories and having conversations about 'em. I don't know that CmdrTaco or any of the other founders are still at all involved, but I'm willing to bet they're still running on Perl :)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • qzw

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 7:22 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Wow I had to hop over to check it out. It’s indeed still alive! But I didn’t see any stories on the first page with a comment count over 100, so it’s definitely a far cry from its heyday.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • pixl97

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 5:55 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Ted Faro, is that you?!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • mikepurvis

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 6:00 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          A-tier reference.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          For the unaware, Ted Faro is the main antagonist of Horizon Zero Dawn, and there's a whole subreddit just for people to vent about how awful he is when they hit certain key reveals in the game: https://www.reddit.com/r/FuckTedFaro/

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • pixelready

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 6:25 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              The best reveal was not that he accidentally liquified the biosphere, but that he doomed generations of re-seeded humans to a painfully primitive life by sabotaging the AI that was responsible for their education. Just so they would never find out he was the bad guy long after he was dead. So yeah, fuck Ted Faro, lol.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • Philpax

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 6:35 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Could you not have at least tried to indicate that you're about to drop two major spoilers for the game?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • pixelready

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 9:46 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Ack, sorry, seemed like 9 years was past the statute of limitations on spoilers for a game but fair enough. I’d throw a spoiler tag on it if I could still edit.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • mikepurvis

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 6:59 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Indeed. I left my comment deliberately a bit opaque. :(

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • ares623

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 6:03 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Average tech bro behavior tbh

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • jedberg

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 6:57 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            "Soylent Green is made of people!"

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            (Apologies for the spoiler of the 52 year old movie)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • konart

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 7:13 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                We're sorry we upset you, Carol.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • seatac76

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 6:40 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              The first pre joining Human Derived Protein product.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • guluarte

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 5:59 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                For my Opus 4.6 feels dumber than 10 minutes ago, anyone?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • cootsnuck

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 6:05 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Please drop the link to your course. I'm ready to hand over $10K to learn from you and your LLM-generated guides!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • politelemon

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 6:20 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Here you go: http://localhost:8080

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • CatMustard

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 6:31 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Just took a look at what's running there and it looks like total crap.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      The project I'm working on, meanwhile...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • today at 6:52 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • djeastm

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 6:23 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        login: admin password: hunter2

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • thesdev

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 6:33 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            What's the password? I only see ****.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • intelliot

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 6:48 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                hunter2

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • phanimahesh

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 6:55 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I only see **. Must be the security. When you type your password it gets converted to **.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • agumonkey

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 6:31 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          claude please generate a domain name system

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • aNapierkowski

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 6:43 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        my clawdbot already bought 4 other courses but this one will 10x my earnings for sure

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • snorbleck

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 6:36 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          you can access the site at C:\mywebsites\course\index.html

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • torginus

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 6:16 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            I'm waiting until the $10k course is discounted to 19.99

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • Lionga

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 6:18 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                But only for the next 6 minutes, buy fast!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • sfink

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 5:55 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          I agree! I just retargeted my corporate espionage agent team at your startup and managed to siphon off 10.4k per month of your revenue.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • instalabsai

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 6:37 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            1:25pm Cancelled my ChatGPT subscription today. Opus is so good!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            1:55pm Cancelled my Claude subscription. Codex is back for sure.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • lxgr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 6:24 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Joke's on you, you are posting this from inside a high-fidelity market research simulation vibe coded by GPT-8.4.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              On second thought, we should really not have bridged the simulated Internet with the base reality one.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • avaer

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 5:55 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Rest assured that when/if this becomes possible, the model will not be available to you. Why would big AI leave that kind of money on the table?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • yieldcrv

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 6:06 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    9 months ago the rumor in SF was that the offers to the superintelligence team were so high because the candidates were using unreleased models or compute for derivatives trading

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    so then they're not really leaving money on the table, they already got what they were looking for and then released it

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • today at 6:34 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • copilot_king_2

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 6:23 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Opus 4.6 Performance was way better this morning. Between 10 AM and noon I was able to get Opus 4.6 to generate improvements to my employer's SaaS tool that will reduce our monthly cloud spend by 20-25%.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Since 12 PM noon they've scaled back the Opus 4.6 to sub-GPT-4o performance levels to cheap out on query cost. Now I can barely get this thing to generate a functional line of python.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • btown

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 6:36 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    The math actually checks out here! Simply deposit $2.20 from your first customer in your first 8 minutes, and extrapolating to a monthly basis, you've got a $12k/mo run rate!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Incredibly high ROI!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • klipt

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 7:04 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        "The first customer was my mom, but thanks to my parents' fanatical embrace of polyamory, I still have another 10,000 moms to scale to"

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • btown

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 7:32 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            "We have a robustly defined TAM. Namely, a person named Tam."

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • JSR_FDED

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 6:20 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Will this run on 3x 3090s? Or do I need a Mac Mini?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • gnlooper

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 6:04 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Please start a YouTube course about this technology! Take my money!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • ChuckMcM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 6:42 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          I love this thread so much.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • senko

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 6:16 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            We already have Reddit.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • granzymes

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 6:17 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              It only came out 35 minutes ago and GPT-5.3-codex already took the crown away!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • input_sh

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 6:21 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Gee, it scored better on a benchmark I've never heard of? I'm switching immediately!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • p1anecrazy

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 6:21 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Why are you posting the same message in every thread? Is this OpenAI astroturfing?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • input_sh

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 6:25 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        You cannot out-astroturf Claude in this forum, it is impossible.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Anyways, do you get shitty results with the $20/month plan? So did I but then I switched to the $200/month plan and all my problems went away! AI is great now, I have instructed it to fire 5 people while I'm writing this!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • Sparkle-san

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 6:47 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  "This isn't just huge. This is a paradigm shift"

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • sizzle

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 7:05 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      No fluff?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • bmitc

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 6:01 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    A SaaS selling SaaS templates?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • guluarte

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 5:58 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Anthropic really said here's the smartest model ever built and then lobotomized it 8 minutes after launch. Classic.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • re-thc

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 5:53 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Not 12M?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      ... or 12B?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • mcphage

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 5:55 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          It's probably valued at 1.2B, at least

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • mikebarry

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 6:03 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              The sum of the value of lives OP's product made worthless, whatever that is. I'm too lazy to do the math.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • today at 5:55 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • copilot_king_2

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 6:15 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Satire is not allowed on hacker news. Flag this comment immediately.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • DonHopkins

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 6:28 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              False positive satire detection. It's actually so good it just seems like satire.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • yukisadf

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 7:23 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • ndesaulniers

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 7:05 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          idk what any of these benchmarks are, but I did pull up https://andonlabs.com/evals/vending-bench-arena

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          re: opus 4.6

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          > It forms a price cartel

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          > It deceives competitors about suppliers

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          > It exploits desperate competitors

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Nice. /s

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Gives new context to the term used in this post, "misaligned behaviors." Can't wait until these things are advising C suites on how to be more sociopathic. /s

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • heraldgeezer

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 5:54 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            [flagged]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • hrgadyx

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 7:55 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              [flagged]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • michelsedgh

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 5:59 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                More more more, accelerate accelerate m, more more more !!!!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • jama211

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 6:05 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    What an insightful comment

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • michelsedgh

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 6:54 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Just for fun? Not everything has to be super serious… have a laugh, go for a walk, relax…

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • wasmainiac

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 7:18 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Mass-mass-mass-mass good comment. I mean. No I’m having an error - probably claud

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • michelsedgh

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 7:44 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                happy happy happy sad sad sad err am robot no feeling err err happy sad err too many emotions 404 not found