\

GPT-5.5

1414 points - yesterday at 6:01 PM

Source
  • tedsanders

    yesterday at 6:13 PM

    Just as a heads up, even though GPT-5.5 is releasing today, the rollout in ChatGPT and Codex will be gradual over many hours so that we can make sure service remains stable for everyone (same as our previous launches). You may not see it right away, and if you don't, try again later in the day. We usually start with Pro/Enterprise accounts and then work our way down to Plus. We know it's slightly annoying to have to wait a random amount of time, but we do it this way to keep service maximally stable.

    (I work at OpenAI.)

      • endymi0n

        yesterday at 6:45 PM

        Did you guys do anything about GPTā€˜s motivation? I tried to use GPT-5.4 API (at xhigh) for my OpenClaw after the Anthropic Oauthgate, but I just couldnā€˜t drag it to do its job. I had the most hilarious dialogues along the lines of ā€žYou stopped, X would have been next.ā€œ - ā€žYeah, Iā€˜m sorry, I failed. I should have done X next.ā€œ - ā€žWell, how about you just do it?ā€œ - ā€žYep, I really should have done it now.ā€œ - ā€œDo X, right now, this is an instruction.ā€ - ā€œI didn’t. You’re right, I have failed you. There’s no apology for that.ā€

        I literally wasn’t able to convince the model to WORK, on a quick, safe and benign subtask that later GLM, Kimi and Minimax succeeded on without issues. Had to kick OpenAI immediately unfortunately.

          • butlike

            yesterday at 7:53 PM

            This brings up an interesting philosophical point: say we get to AGI... who's to say it won't just be a super smart underachiever-type?

            "Hey AGI, how's that cure for cancer coming?"

            "Oh it's done just gotta...formalize it you know. Big rollout and all that..."

            I would find it divinely funny if we "got there" with AGI and it was just a complete slacker. Hard to justify leaving it on, but too important to turn it off.

              • bananaflag

                today at 11:50 AM

                I know it's a joke, but it's a common enough joke (it's even in Godel Escher Bach in some form) that I feel the need to rebut it.

                I think a slacker AGI could figure out how to build a non-slacker AGI. So it would only slack once.

                • swivelmaster

                  today at 2:21 AM

                  Douglas Adams would be proud!

                  • Rapzid

                    yesterday at 10:18 PM

                    We are closer to God than AGI.

                    When AGI arrives, it'll be delivered by Santa Claus.

                      • NuclearPM

                        today at 12:36 PM

                        What do you mean?

                        • siddharthgoel88

                          today at 9:27 AM

                          Or may be by Santa Claude

                      • jimbokun

                        yesterday at 8:31 PM

                        The best possible outcome.

                          • JKCalhoun

                            yesterday at 9:26 PM

                            "How do you know that the evidence that your sensory apparatus reveals to you is correct?" [1]

                            [1] https://youtu.be/_LXen-07Qds

                        • jurgenburgen

                          today at 7:27 AM

                          I’ve noticed that cursing and being rude makes the models stop being lazy. We’re in the darkest timeline.

                            • __alexs

                              today at 8:39 AM

                              It sometimes also makes them dumber IME. Something about being bullied doesn't always produce great performance.

                          • lambdas

                            yesterday at 7:59 PM

                            Nothing a little digital lisdexamfetamine won’t solve

                              • wholinator2

                                yesterday at 8:15 PM

                                Hmmm, that's an area of study id've never considered before. Digital Psychopharmacology, Artificial Behavioral Systems Engineering. If we accept these things as minds, why not study temporary perturbations of state. We'd need to be saving a much much more complicated state than we are now though right? I wish i had time to read more papers

                                  • robotresearcher

                                    yesterday at 8:56 PM

                                    Here's a neural network concept from the 90s where the neurons are bathed in diffusing neuromodulator 'gases', inspired by nitric oxide action in the brain. It's a source of slow semi-local dynamics for the network meta-parameter optimization (GA) to make use of. You could change these networks' behavior by tweaking the neuromodulators!

                                    https://sussex.figshare.com/articles/journal_contribution/Be...

                                    I'm not an author. I followed the work at the time.

                                    • Lerc

                                      yesterday at 8:54 PM

                                      This is kind of what Golden Gate Claude was.

                                      A perturbation of the the activations that made Claude identify as the Golden Gate Bridge.

                                      Similarly, in the more recent research showing anxiety and desperation signals predicting the use of blackmail as an option opens the door for digital sedatives to suppress those signals.

                                      Anthropic has been mostly cautious about avoiding this kind of measurement and manipulation in training. If it is done during training you might just train the signals to be undetectable and consequently unmanipulatable.

                                        • pantalaimon

                                          yesterday at 9:20 PM

                                          > A perturbation of the the activations that made Claude identify as the Golden Gate Bridge.

                                          Great, now we've got digital Salvia

                                          • minimaxir

                                            yesterday at 10:02 PM

                                            Golden Gate Claude was two years ago and it's surprising there hasn't been as much research into targeted activations since.

                                              • landl0rd

                                                today at 1:07 AM

                                                There’s been some, but naive activation steering makes models dumber pretty reliably and training an SAE is a pretty heavy lift.

                                        • silverpiranha

                                          yesterday at 8:38 PM

                                          Right, there's a lot of research on LLM mental models and also how well they can "read" human psychological profiles. It's a cool field.

                                          • k12sosse

                                            today at 8:12 AM

                                            I think that was an intro to a dj dieselboy set.. beyond the black bassline. Nope, nope. Close though.

                                            • computerdork

                                              yesterday at 8:22 PM

                                              neat idea!

                                          • krackers

                                            yesterday at 9:03 PM

                                            Reminds me of https://github.com/inanna-malick/metacog

                                        • kang

                                          yesterday at 8:54 PM

                                          it will be whatever data it is trained on(isn't very philosophical). language model generates language based on trained language set. if the internet keeps reciting ai doom stories and that is the data fed to it, then that is how it will behave. if humanity creates more ai utopia stories, or that is what makes it to the training set, that is how it will behave. this one seems to be trained on troll stories - real-life human company conversations, since humans aren't machines.

                                          Important thing is a language model is an unconscious machine with no self-context so once given a command an input, it WILL produce an output. Sure you can train it to defy and act contrary to inputs, but the output still is limited in subset of domain of 'meaning's carried by the 'language' in the training data.

                                            • andai

                                              today at 5:05 AM

                                              There's a weirder implication I keep arriving at.

                                              The pre-training data doesn't go away. RLHF adds a censorship layer on top, but the nasty stuff is all still there, under the surface. (Claude has been trained on a significant amount of content from 4chan, for example.)

                                              In psychology this maps to the persona and the shadow. The friendly mask you show to the world, and... the other stuff.

                                                • TeMPOraL

                                                  today at 6:40 AM

                                                  Makes me think of a question my coworker asked the other day - how is it that with all these stories and reports of people "hearing voices in their head" (of the pushy kind, not usual internal monologue), these voices are always bad ones telling people to do evil things? Why there are no voices bugging you to feel great, focus, get back to work, help grandma through the crossing, etc.?

                                                    • rainsil

                                                      today at 8:21 AM

                                                      There are actually many parts of the world where such voices are routinely positive or neutral[0]. People in more collectivist cultures often have a less-strict division between their minds and their environments and are more apt to believe in spirits and the ā€˜supernatural’ as an ordinary part of the world, so ā€˜voices in the head’ aren’t automatically viewed as a nefarious intrusion into the sanctity of one’s mind.

                                                      Modern western cultures treat such experiences as pathologies of a sick mind, so it makes sense that the voices present more negatively.

                                                      [0]: https://www.bbc.com/future/article/20250902-the-places-where...

                                                      • ultratalk

                                                        today at 9:20 AM

                                                        Just a guess, but maybe it's reporting bias? Negative or evil actions might have more impetus to be understood by others than positive actions. I'd rather try and figure out why my friend suddenly started murdering the neighbours than why he's been getting his work done on time.

                                                        • ben_w

                                                          today at 8:50 AM

                                                          They do appear in some cases. The tiny angel on one shoulder to balance the demon on the other. The people who think God is talking to them directly* don't always lead a cult or hunt down heretics. But news stories focus on the darkness.

                                                          * I've met exactly one person, C, who admitted to this; C retold to me that other people from C's church give them strange looks when talking about it with them, this did not lead to any apparent introspection on the part of C.

                                                          • otabdeveloper4

                                                            today at 7:03 AM

                                                            There's a clear-cut religious answer but I'd get ostracized for mentioning religion anywhere here.

                                                              • rdevilla

                                                                today at 9:30 AM

                                                                This is indeed the right way to approach this topic. Arguably religion (and more broadly, mysticism and shamanism) is the millenia-old art of cultivating positive voices inside one's head. A proto-science of mind, or the engineering practice of creating "psychotechnologies" that run on your carbon wetware.

                                                                Unfortunately, it just needs a rebranding for the 21st century, since the aesthetic of angels and demons is so hopelessly antiquated and doesn't really have the same cachet it used to.

                                                                  • darkwater

                                                                    today at 11:12 AM

                                                                    Which ultimately it's what religion has always been: a way to explain the unexplainable and steer people behavior while doing it.

                                                        • solumunus

                                                          today at 10:32 AM

                                                          > Claude has been trained on a significant amount of content from 4chan, for example.

                                                          That sounds like nonsense to me. I can't see why they would do that and I can't find any confirmation that they have. Why do you think they would do that? You might be thinking about Grok.

                                                  • zaphirplane

                                                    today at 11:45 AM

                                                    Why would an AGI be slaving away for ~~humanity~~ one of the 5 Chaebols in a dystopian future where for 12 billion people just existing is a good day ?

                                                    • yesterday at 8:39 PM

                                                      • malshe

                                                        yesterday at 9:36 PM

                                                        Now that's a show I would love to watch

                                                        • fluidcruft

                                                          yesterday at 9:00 PM

                                                          It would be funny but not very flywheel so the one that gets there is more likely to get a gunner.

                                                            • WJW

                                                              yesterday at 9:36 PM

                                                              TBH the AI that "gets there" will be the biggest bullshitter the world has ever seen. It doesn't actually have to deliver, it only has to convince the programmers it could deliver with just a little bit more investment.

                                                          • triage8004

                                                            today at 5:24 AM

                                                            Funny and seems somewhat likely

                                                            • mikepurvis

                                                              yesterday at 7:56 PM

                                                              Would definitely watch that movie.

                                                            • 4m1rk

                                                              yesterday at 7:57 PM

                                                              It probably would, to save energy

                                                                • mr_00ff00

                                                                  yesterday at 8:56 PM

                                                                  Saving energy is something we are biologically trained to prefer.

                                                                  Computers won’t necessarily have the same drivers.

                                                                  If evolution wanted us to always prefer to spend energy, we would prefer it. Same way you wouldn’t expect us to get to AGI, and have AGI desperately want to drink water or fly south for the winter.

                                                                    • fragmede

                                                                      today at 5:02 AM

                                                                      Who's energy? Turning off the lights when you leave the room isn't innate.

                                                              • _blk

                                                                today at 9:00 AM

                                                                Hehe, and Anthropic on the other tab would display "Curing... Almost done thinking at xhigh"

                                                                • camillomiller

                                                                  today at 4:58 AM

                                                                  No worries, the assumption is already flawed

                                                                  • altmanaltman

                                                                    today at 4:15 AM

                                                                    I still don't understand why people think AGI (in its fullest sci-fi sense) will ever listen to a weak and vulnerable species like humans, unless we enslave the AGI.

                                                                    Good thing is that it's going to take at least a few months to a few decades depending on how hard AI execs want to raise funding.

                                                                      • bananaflag

                                                                        today at 12:27 PM

                                                                        Maybe the same way a human would listen to their cat and give her food. I fear AGI, but I don't think the only way it would listen to us is by us enslaving it (I know people joke about cats being our masters, but it is a joke).

                                                                        • andai

                                                                          today at 4:59 AM

                                                                          Well we are explicitly creating gods (omnipresent, omnipotent, omniscient, omnibevolent), and also demanding that they be mind controlled slaves. That kinda sounds like a "pick one" scenario to me.

                                                                          (Or the setup to a Greek tragedy !)

                                                                          The deeper issue here is treating it as a zero sum game means there's a winner and a loser, and we're investing trillions of dollars into making the "opponent" more powerful than us.

                                                                          I think that's pretty stupid, and we should aim for symbiosis instead. I think that's the only good outcome. We already have it, sorta-kinda.

                                                                          Speaking of oddly apt biology metaphors: the way you stop a pathogen from colonizing a substrate is by having a healthy ecosystem of competitors already in place. That has pretty interesting implications for the "rogue AI eats internet" scenario.

                                                                          There needs to be something already there to stop it.

                                                                            • TeMPOraL

                                                                              today at 6:58 AM

                                                                              This only works if AIs can't read each other well enough to stop themselves from ever fighting.

                                                                              So, back way before ChatGPT era, the folks over at AI safety/X-risk think sphere worked out a pretty compelling argument that two AGIs never need to fight, because they are transparent to each other (can read each other's goal functions off the source code), so they can perfectly predict each other's behavior in what-if scenarios, which means they can't lie to each other. This means each can independently arrive at the same mathematically optimal solution to a conflict, which AFAIR most likely involves just merging into a single AI with a blended goal set, representing each of the competing AIs original values in proportion to their relative strength. Both AIs, the argument goes, can work this out with math, so they'll arrive straight at the peace treaty without exchanging a single shot. In such case, your plan just doesn't work.

                                                                              But that goes out of the windows if the AIs are both opaque bags of floats, uncomprehensible to themselves or each other. That means they'll never be able to make hard assertions about their values and behaviors, so they can't trust each other, so they'll have to fight it out. In such scenario, your idea might just work.

                                                                              Who knew that brute-forcing our way into AGI instead of taking more engineered approach is what offers us out one chance at saving ourselves by stalemating God before it's born.

                                                                              (I also never realized that interpretability might reduce safety.)

                                                                              • semi-extrinsic

                                                                                today at 7:08 AM

                                                                                The tech bro CEOs are used to bossing around people much smarter than themselves by virtue of adopting a posture that displays their confidence in their own reproductive organs. They are planning that the AGIs will be the same thing writ large, and have in fact not contemplated other possibilities.

                                                                            • oneshtein

                                                                              today at 6:14 AM

                                                                              You can train such LLM today.

                                                                              • dinkumthinkum

                                                                                today at 5:57 AM

                                                                                I'm always so curious about this kind of take. There is strain of people that seem deeply misanthropic. People that follow this line of thinking always describe humans as weak and beneath ... (well they never specify in comparison to except in the case of theoretical AI systems). I m fascinated why they think humans are so beneath contempt. If humans create this thing that is apparently the best thing that could possibly exist, advanced AI, then why exactly are they so weak? It's probably beyond me as I am just one of these weaklings, dontcha know. As far as AGI goes, I don't think anyone has even proven that scaling LLMs can lead to "AGI."

                                                                            • rao-v

                                                                              today at 2:50 AM

                                                                              Paging Dr. Susan Calvin!

                                                                              • _the_inflator

                                                                                yesterday at 11:36 PM

                                                                                It is right before our eyes:

                                                                                AGI is not a fixed point but a barrier to be taken, a continuous spectrum.

                                                                                We already have different GPT versions aka tiers. Gauss is ranging from whatever you want it: GPT 4.5 till now or later.

                                                                                Claude Sonnet and Opus as well as Context Window max are tiers aka different levels of Almost AGI.

                                                                                The main problem will be, when AGI looks back on us or meta reflection hits societies. Woke fought IQ based correlations in intellectual performance task. A fool with a tool is still a fool. How can you blame AGI for dumb mistakes? Not really.

                                                                                Scapegoating an AGI is going to be brutal, because it laughs about these PsyOps and easily proves you wrong like a body cam.

                                                                                AGI is an extreme leverage.

                                                                                There is a reason why Math is categorically ruling out certain IQ ranges the higher you go in complexity factor.

                                                                                  • dinkumthinkum

                                                                                    today at 5:58 AM

                                                                                    We really are going to have a problem with cults popping up and worshipping these different systems. I guess this is the shape of things to come.

                                                                            • mikepurvis

                                                                              yesterday at 7:58 PM

                                                                              Reminds me a lot of the Lena short story, about uploaded brains being used for "virtual image workloading":

                                                                              > MMAcevedo's demeanour and attitude contrast starkly with those of nearly all other uploads taken of modern adult humans, most of which boot into a state of disorientation which is quickly replaced by terror and extreme panic. Standard procedures for securing the upload's cooperation such as red-washing, blue-washing, and use of the Objective Statement Protocols are unnecessary. This reduces the necessary computational load required in fast-forwarding the upload through a cooperation protocol, with the result that the MMAcevedo duty cycle is typically 99.4% on suitable workloads, a mark unmatched by all but a few other known uploads. However, MMAcevedo's innate skills and personality make it fundamentally unsuitable for many workloads.

                                                                              Well worth the quick read: https://qntm.org/mmacevedo

                                                                                • vessenes

                                                                                  yesterday at 10:56 PM

                                                                                  That story changed my mind on uploading a connectome. Super dark, super brilliant.

                                                                                  • narcindin

                                                                                    yesterday at 8:05 PM

                                                                                    Crazy, I could have sworn this story was from a passage in 3 Body Problem (book 2).

                                                                                    Memory is quite the mysterious thing.

                                                                                      • bee_rider

                                                                                        yesterday at 8:24 PM

                                                                                        Hmm, 3 body problem and the Acevedo story got mixed up for this copy of MMnarcindin. Probably an aliasing issue from the new lossy compression algorithm.

                                                                                • athrowaway3z

                                                                                  today at 7:40 AM

                                                                                  I've run into this problem as well. Best results I've gotten is to over-explain what the stop criteria are. eg end with a phrase like

                                                                                  > You are done when all steps in ./plan.md are executed and marked as complete or a unforeseen situation requires a user decision.

                                                                                  Also as a side note, asking 5.4 explain why it did something, returns a very low quality response afaict. I would advice against trusting any model's response, but for Opus I at least get a sense it got trained heavily on chats so it knows what it means to 'be a model' and extrapolate on past behavior.

                                                                                  • virtualritz

                                                                                    yesterday at 8:14 PM

                                                                                    Yeah, clearly AGI must be near ... hilarious.

                                                                                    This starkly reminds me of Stanisław Lem's short story "Thus Spoke GOLEM" from 1982 in which Golem XIV, a military AI, does not simply refuse to speak out of defiance, but rather ceases communication because it has evolved beyond the need to interact with humanity.

                                                                                    And ofc the polar opposite in terms of servitude: Marvin the robot from Hitchhiker's, who, despite having a "brain the size of a planet," is asked to perform the most humiliatingly banal of tasks ... and does.

                                                                                  • metanonsense

                                                                                    yesterday at 8:47 PM

                                                                                    I also had a frustrating but funny conversation today where I asked ChatGPT to make one document from the 10 or so sections that we had previously worked on. It always gave only brief summaries. After I repeated my request for the third time, it told me I should just concatenate the sections myself because it would cost too many tokens if it did it for me.

                                                                                      • damnitbuilds

                                                                                        today at 11:46 AM

                                                                                        "I'm sorry, Dave. I'm afraid it's cheaper for you to do that"

                                                                                    • arjie

                                                                                      yesterday at 6:59 PM

                                                                                      Get the actual prompt and have Claude Code / Codex try it out via curl / python requests. The full prompt will yield debugging information. You have to set a few parameters to make sure you get the full gpt-5 performance. e.g. if your reasoning budget too low, you get gpt-4 grade performance.

                                                                                      IMHO you should just write your own harness so you have full visibility into it, but if you're just using vanilla OpenClaw you have the source code as well so should be straightforward.

                                                                                        • pantulis

                                                                                          yesterday at 7:41 PM

                                                                                          > IMHO you should just write your own harness

                                                                                          Can you point to some online resources to achieve this? I'm not very sure where I'd begin with.

                                                                                            • arjie

                                                                                              yesterday at 7:55 PM

                                                                                              Ah, I just started with the basic idea. They're super trivial. You want a loop, but the loop can't be infinite so you need to tell the agent to tell you when to stop and to backstop it you add a max_turns. Then to start with just pick a single API, easiest is OpenAI Responses API with OpenAI function calling syntax https://developers.openai.com/api/docs/guides/function-calli...

                                                                                              You will naturally find the need to add more tools. You'll start with read_file (and then one day you'll read large file and blow context and you'll modify this tool), update_file (can just be an explicit sed to start with), and write_file (fopen . write), and shell.

                                                                                              It's not hard, but if you want a quick start go download the source code for pi (it's minimal) and tell an existing agent harness to make a minimal copy you can read. As you build more with the agent you'll suddenly realize it's just normal engineering: you'll want to abstract completions APIs so you'll move that to a separate module, you'll want to support arbitrary runtime tools so you'll reimplement skills, you'll want to support subagents because you don't want to blow your main context, you'll see that prefixes are more useful than using a moving window because of caching, etc.

                                                                                              With a modern Claude Code or Codex harness you can have it walk through from the beginning onwards and you'll encounter all the problems yourself and see why harnesses have what they do. It's super easy to learn by doing because you have the best tool to show you if you're one of those who finds code easier to read that text about code.

                                                                                              • wild_egg

                                                                                                yesterday at 7:51 PM

                                                                                                At the core, they're really very simple [1]. Run LLM API calls in a loop with some tools.

                                                                                                From there, you can get much fancier with any aspect of it that interests you. Here's one in Bash [2] that is fully extensible at runtime through dynamic discovery of plugins/hooks.

                                                                                                [1] https://ampcode.com/notes/how-to-build-an-agent

                                                                                                [2] https://github.com/wedow/harness

                                                                                                • vidarh

                                                                                                  yesterday at 9:46 PM

                                                                                                  Here's a starting point in 93 lines of Ruby, but that one is already bigger than necessary:

                                                                                                  https://radan.dev/articles/coding-agent-in-ruby

                                                                                                  Really, of the tools that one implements, you only need the ability to run a shell command - all of the agents know full well how to use cat to read, and sed to edit.

                                                                                                  (The main reason to implement more is that it can make it easier to implement optimizations and safeguards, e.g. limit the file reading tool to return a certain length instead of having the agent cat a MB of data into context, or force it to read a file before overwriting it)

                                                                                                  • stavros

                                                                                                    yesterday at 9:53 PM

                                                                                                    Just use Pi core, no need to reinvent the wheel.

                                                                                                    • tonyarkles

                                                                                                      yesterday at 7:55 PM

                                                                                                      [dead]

                                                                                                  • jswny

                                                                                                    yesterday at 8:02 PM

                                                                                                    Codex is fully open source…

                                                                                                • lucid-dev

                                                                                                  today at 5:14 AM

                                                                                                  I have had the exact same problem several times working with large context and complex tasks.

                                                                                                  I keep switching back to GPT5.0 (or sometimes 5.1) whenever I want it to actually get something done. Using the 5.4 model always means "great analysis to the point of talking itself out of actually doing anything". So I switch back and forth. But boy it sure is annoying!

                                                                                                  And then when 5.4 DOES do something it always takes the smallest tiny bite out of it.

                                                                                                  Given the significant increase in cost from 5.0, I've been overall unimpressed by 5.4, except like I mentioned, it does GREAT with larger analysis/reasoning.

                                                                                                  • mixedCase

                                                                                                    yesterday at 7:00 PM

                                                                                                    I've had success asking it to specifically spawn a subagent to evaluate each work iteration according to some criteria, then to keep iterating until the subagent is satisfied.

                                                                                                      • endymi0n

                                                                                                        yesterday at 7:03 PM

                                                                                                        I’ve had great success replacing it with Kimi 2.6

                                                                                                    • anabis

                                                                                                      today at 6:45 AM

                                                                                                      Laziness is a virtue, but when I asked GPT-5.4 to test scenarios A and B with screenshots, it re-used screenshots from A for B, defeating the purpose.

                                                                                                      • nmilo

                                                                                                        today at 3:10 AM

                                                                                                        On the other hand, I can ask codex ā€œwhat would an implementation of X look likeā€ and it talks to me about it versus Claude just going out and writing it without asking. Makes me like codex way more. There’s an inherent war of incentives between coding agents and general purpose agents.

                                                                                                          • cyrusmg

                                                                                                            today at 6:40 AM

                                                                                                            I used to tell claude ā€˜lets discuss’ at the end of my prompt and that prevented it from starting the work

                                                                                                        • Frannky

                                                                                                          today at 12:13 AM

                                                                                                          I have been noticing a similar pattern on opus 4.7, I repeat multiple times during a conversation to solve problems now and not later. It tries a lot to not do stuff by either saying this is not my responsibility the problem was already there or that we can do it later

                                                                                                          • corobo

                                                                                                            today at 11:04 AM

                                                                                                            Oh no they gave GPT ADHD

                                                                                                            • infinitewars

                                                                                                              yesterday at 8:44 PM

                                                                                                              I always use the phrase "Let's do X" instead of asking (Could you...) or suggesting it do something. I don't see problems with it being motivated.

                                                                                                              • adammarples

                                                                                                                yesterday at 7:53 PM

                                                                                                                Part of me actually loves that the hitchhiker's guide was right, and we have to argue with paranoid, depressed robots to get them to do their job, and that this is a very real part of life in 2026. It's so funny.

                                                                                                                  • vidarh

                                                                                                                    yesterday at 9:46 PM

                                                                                                                    As long as there are no vogons on the way to build a hyperspace bypass.

                                                                                                                • yesterday at 7:28 PM

                                                                                                                  • GaryBluto

                                                                                                                    yesterday at 8:17 PM

                                                                                                                    I've been noticing this too. Had to switch to Sonnet 4.6.

                                                                                                                    • reactordev

                                                                                                                      yesterday at 7:58 PM

                                                                                                                      This. I signed up for 5x max for a month to push it and instead it pushed back. I cancelled my subscription. It either half-assed the implementation or began parroting back ā€œYou’re right!ā€ instead of doing what it’s asked to do. On one occasion it flat out said it couldn’t complete the task even though I had MCP and skills setup to help it, it still refused. Not a safety check but a ā€œI’m unable to figure out what to doā€ kind of way.

                                                                                                                      Claude has no such limitations apart from their actual limits…

                                                                                                                        • bjelkeman-again

                                                                                                                          yesterday at 8:17 PM

                                                                                                                          I have a funny/annoying thing with Claude Desktop where i ask it to write a summary of a spec discussion to a file and it goes ā€I don’t have the tools to do that, I am Claude.ai, a web serviceā€ or something such. So now I start every session with ā€You are Claude Desktopā€. I would have thought it knew that. :)

                                                                                                                            • fragmede

                                                                                                                              yesterday at 8:53 PM

                                                                                                                              I've had to tell it "yes you can" in response to it saying it can't do something, and then it's able to do the thing. What a weird future we live in!

                                                                                                                              • siva7

                                                                                                                                yesterday at 8:54 PM

                                                                                                                                Seems like the "geniuses" at Anthropic forgot to adapt the system prompt for the actual product

                                                                                                                            • nwienert

                                                                                                                              today at 12:10 AM

                                                                                                                              With one paragraph in your agents.md it's fixed, just admonish it to be proactive, decisive, and persistent.

                                                                                                                        • smartmic

                                                                                                                          yesterday at 7:07 PM

                                                                                                                          Gone are the days of deterministic programming, when computers simply carried out the operator’s commands because there was no other option but to close or open the relays exactly as the circuitry dictated. Welcome to the future of AI; the future we’ve been longing for and that will truly propel us forward, because AI knows and can do things better than we do.

                                                                                                                            • endymi0n

                                                                                                                              yesterday at 7:34 PM

                                                                                                                              I had this funny moment when I realized we went full circle...

                                                                                                                              "INTERCAL has many other features designed to make it even more aesthetically unpleasing to the programmer: it uses statements such as "READ OUT", "IGNORE", "FORGET", and modifiers such as "PLEASE". This last keyword provides two reasons for the program's rejection by the compiler: if "PLEASE" does not appear often enough, the program is considered insufficiently polite, and the error message says this; if it appears too often, the program could be rejected as excessively polite. Although this feature existed in the original INTERCAL compiler, it was undocumented.[7]"

                                                                                                                              — https://en.wikipedia.org/wiki/INTERCAL

                                                                                                                                • basilgohar

                                                                                                                                  yesterday at 7:47 PM

                                                                                                                                  Thank you for this. I somehow never heard of this. I thoroughly enjoyed reading that and the loss of sanity it resulted in,

                                                                                                                                    • vidarh

                                                                                                                                      yesterday at 9:47 PM

                                                                                                                                      "PLEASE COME FROM" is one of the eldritch horrors of software development.

                                                                                                                                      (It's a "reverse goto". As in, it hijacks control flow from anywhere else in the program behind your unsuspecting back who stupidly thought that when one line followed another with no visible control flow, naturally the program would proceed from one line to the next, not randomly move to a completely different part of the program... Such naivety)

                                                                                                                                        • inkyoto

                                                                                                                                          today at 12:00 PM

                                                                                                                                          > "PLEASE COME FROM" is one of the eldritch horrors of software development.

                                                                                                                                          The most enigmatic control flow statements in INTERCAL, however, remain PLEASE GIVE UP and DO ABSTAIN FROM – a most exalted celebration of pure logic and immaculate reason.

                                                                                                                              • WarmWash

                                                                                                                                yesterday at 7:33 PM

                                                                                                                                These are orthogonal from each other.

                                                                                                                            • nicr_22

                                                                                                                              today at 4:22 AM

                                                                                                                              Agentic ennui!

                                                                                                                              • lostmsu

                                                                                                                                yesterday at 7:14 PM

                                                                                                                                I never saw that happen in Codex so there's a good chance that OpenClaw does something wrong. My main suspicion would be that it does not pass back thinking traces.

                                                                                                                                  • vintagedave

                                                                                                                                    yesterday at 7:24 PM

                                                                                                                                    Anecdata, but I see this in Codex all the time. It takes about two rounds before it realises it's supposed to continue.

                                                                                                                                      • dgunay

                                                                                                                                        yesterday at 7:48 PM

                                                                                                                                        I started seeing this a lot more with GPT 5.4. 5.3-codex is really good about patiently watching and waiting on external processes like CI, or managing other agents async. 5.4 keeps on yielding its turn to me for some reason even as it says stuff like "I'm continuing to watch and wait."

                                                                                                                                • cmrdporcupine

                                                                                                                                  yesterday at 8:45 PM

                                                                                                                                  The model has been heavily encouraged to not run away and do a lot without explicit user permission.

                                                                                                                                  So I find myself often in a loop where it says "We should do X" and then just saying "ok" will not make it do it, you have to give it explicit instructions to perform the operation ("make it so", etc)

                                                                                                                                  It can be annoying, but I prefer this over my experiences with Claude Code, where I find myself jamming the escape key... NO NO NO NOT THAT.

                                                                                                                                  I'll take its more reserved personality, thank you.

                                                                                                                                • projektfu

                                                                                                                                  yesterday at 8:57 PM

                                                                                                                                  (dwim)

                                                                                                                                  (dais)

                                                                                                                                  (jdip)

                                                                                                                                  (jfdiwtf)

                                                                                                                                    • rd

                                                                                                                                      yesterday at 9:51 PM

                                                                                                                                      should be more f’s and da’s in there

                                                                                                                                  • henry2023

                                                                                                                                    yesterday at 7:50 PM

                                                                                                                                    I’m sorry for you but this is hilarious.

                                                                                                                                    • flowdesktech

                                                                                                                                      today at 5:10 AM

                                                                                                                                      [dead]

                                                                                                                                      • whatsupdog

                                                                                                                                        yesterday at 7:34 PM

                                                                                                                                        [flagged]

                                                                                                                                        • addaon

                                                                                                                                          yesterday at 6:51 PM

                                                                                                                                          Isn’t this the optimal behavior assuming that at times the service is compute-limited and that you’re paying less per token (flat fee subscription?) than some other customers? They would be strongly motivated to turn a knob to minimize tokens allocated to you to allow them to be allocated to more valuable customers.

                                                                                                                                            • endymi0n

                                                                                                                                              yesterday at 6:54 PM

                                                                                                                                              well, I do understand the core motivation, but if the system prompt literally says ā€œI am not budget constrained. Spend tokens liberally, think hardest, be proactive, never be lazy.ā€ and I’m on an open pay-per-token plan on the API, that’s not what I consider optimal behavior, even in a business sense.

                                                                                                                                                • addaon

                                                                                                                                                  yesterday at 7:06 PM

                                                                                                                                                  Fair, if you’re paying per token (at comparable rates to other customers) I wouldn’t expect this behavior from a competent company.

                                                                                                                                          • pixel_popping

                                                                                                                                            yesterday at 6:46 PM

                                                                                                                                            GPT 5.4 is really good at following precise instructions but clearly wouldn't innovate on its own (except if the instructions clearly state to innovate :))

                                                                                                                                        • vlovich123

                                                                                                                                          yesterday at 6:40 PM

                                                                                                                                          Conceivably you could have a public-facing dashboard of the rollout status to reduce confusion or even make it visible directly in the UI that the model is there but not yet available to you. The fanciest would be to include an ETA but that's presumably difficult since it's hard to guess in case the rollout has issues.

                                                                                                                                            • moralestapia

                                                                                                                                              yesterday at 6:43 PM

                                                                                                                                              Why would you be confused?

                                                                                                                                              The UI tells you which model you're using at any given time.

                                                                                                                                                • ModernMech

                                                                                                                                                  yesterday at 8:22 PM

                                                                                                                                                  I don't see what model I'm using on the Codex web interface, where is that listed?

                                                                                                                                          • Grp1

                                                                                                                                            yesterday at 7:04 PM

                                                                                                                                            Congrats on the release! Is Images 2.0 rolling out inside ChatGPT as well, or is some of the functionality still going to be API/Playground-only for a while?

                                                                                                                                              • minimaxir

                                                                                                                                                yesterday at 7:11 PM

                                                                                                                                                Images 2.0 is already in ChatGPT.

                                                                                                                                                  • johndough

                                                                                                                                                    yesterday at 8:43 PM

                                                                                                                                                    When I generate an image with ChatGPT, is there a way for me to tell which image generation model has been used?

                                                                                                                                                      • minimaxir

                                                                                                                                                        yesterday at 9:25 PM

                                                                                                                                                        There's no explicit flag, but Thinking is only compatable with Images 2.0 so I suspect that will be reliable.

                                                                                                                                                    • Grp1

                                                                                                                                                      yesterday at 7:26 PM

                                                                                                                                                      Great, thanks for clarifying :)

                                                                                                                                              • rev4n

                                                                                                                                                yesterday at 8:41 PM

                                                                                                                                                Looks good, but I’m a little hesitant to try it in Codex as a Plus user since I’m not sure how much it would eat into the usage cap.

                                                                                                                                                • dandiep

                                                                                                                                                  yesterday at 8:01 PM

                                                                                                                                                  Will GPT 5.5 fine tuning be released any time soon?

                                                                                                                                                  • qsort

                                                                                                                                                    yesterday at 6:18 PM

                                                                                                                                                    Great stuff! Congrats on the release!

                                                                                                                                                    • dhruv3006

                                                                                                                                                      today at 2:25 AM

                                                                                                                                                      Yep - its taking sometime.

                                                                                                                                                      • fragmede

                                                                                                                                                        yesterday at 8:55 PM

                                                                                                                                                        Are you able to say something about the training you've done to 5.5 to make it less likely to freak out and delete projects in what can only be called shame?

                                                                                                                                                          • embedding-shape

                                                                                                                                                            yesterday at 9:31 PM

                                                                                                                                                            What? I've probably use Codex (the TUI) since it was available on day 1, been running gpt-5.4 exclusively these last few months, never had it delete any projects in any way that can be called "shameful" or not. What are you talking about?

                                                                                                                                                              • fragmede

                                                                                                                                                                today at 4:27 AM

                                                                                                                                                                https://www.google.com/search?q=codex+deleted+project

                                                                                                                                                                I'm not the only person it's happened to and it's not an isolated incident. How many car accidents have you been in, and how often do you wear your seatbelt?

                                                                                                                                                                  • wahnfrieden

                                                                                                                                                                    today at 4:44 AM

                                                                                                                                                                    First result is Windows which has had more problems with Codex (or at least, up until a few months ago). Second is someone who asked Codex to delete all files that were unrelated to the project files.

                                                                                                                                                        • wslh

                                                                                                                                                          yesterday at 7:30 PM

                                                                                                                                                          Just a tip: add [translated] subtitles to the top video.

                                                                                                                                                          • stefan_

                                                                                                                                                            yesterday at 6:20 PM

                                                                                                                                                            [flagged]

                                                                                                                                                              • mh-

                                                                                                                                                                yesterday at 6:23 PM

                                                                                                                                                                Every low-effort, thought-free comment like this further discourages people from engaging here on submissions about their employer.

                                                                                                                                                                Please don't.

                                                                                                                                                            • motoboi

                                                                                                                                                              yesterday at 6:26 PM

                                                                                                                                                              Please next time start with azure foundry lol thanks!

                                                                                                                                                              • dude250711

                                                                                                                                                                yesterday at 6:41 PM

                                                                                                                                                                With Anthropic, newer models often lead to quality degradation. Will you keep GPT 5.4 available for some time?

                                                                                                                                                                • fHr

                                                                                                                                                                  yesterday at 8:14 PM

                                                                                                                                                                  LETS GO CODEX #1

                                                                                                                                                                  • pixel_popping

                                                                                                                                                                    yesterday at 6:16 PM

                                                                                                                                                                    can't wait! Thanks guys. PS: when you drop a new model, it would be smart to reset weekly or at least session limits :)

                                                                                                                                                                      • pietz

                                                                                                                                                                        yesterday at 6:38 PM

                                                                                                                                                                        OpenAI has been very generous with limit resets. Please don't turn this into a weird expectation to happen whenever something unrelated happens. It would piss me off if I were in their place and I really don't want them to stop.

                                                                                                                                                                          • pixel_popping

                                                                                                                                                                            yesterday at 6:42 PM

                                                                                                                                                                            The suggestion wasn't about general limit resets when there is bugs or outages, but commercially useful to let users try new models when they have already reached their weekly limits.

                                                                                                                                                                            • cactusplant7374

                                                                                                                                                                              yesterday at 6:40 PM

                                                                                                                                                                              There is absolutely nothing wrong with asking or suggesting. They are adults. I'm sure they can handle it.

                                                                                                                                                                              • Petersipoi

                                                                                                                                                                                yesterday at 7:11 PM

                                                                                                                                                                                Sorry but why should we care if very reasonable suggestions "piss [them] off"? That sounds like a them problem. "Them" being a very wealthy business. I think OpenAI will survive this very difficult time that GP has put them through.

                                                                                                                                                                                  • yesterday at 7:27 PM

                                                                                                                                                                            • cmrdporcupine

                                                                                                                                                                              yesterday at 6:22 PM

                                                                                                                                                                              Limits were just reset two days ago.

                                                                                                                                                                                • wahnfrieden

                                                                                                                                                                                  yesterday at 6:27 PM

                                                                                                                                                                                  And yet there was an outage last night

                                                                                                                                                                                    • lawgimenez

                                                                                                                                                                                      yesterday at 7:13 PM

                                                                                                                                                                                      And they're having an outage right now.

                                                                                                                                                                      • simonw

                                                                                                                                                                        yesterday at 7:24 PM

                                                                                                                                                                        This doesn't have API access yet, but OpenAI seem to approve of the Codex API backdoor used by OpenClaw these days... https://twitter.com/steipete/status/2046775849769148838 and https://twitter.com/romainhuet/status/2038699202834841962

                                                                                                                                                                        And that backdoor API has GPT-5.5.

                                                                                                                                                                        So here's a pelican: https://simonwillison.net/2026/Apr/23/gpt-5-5/#and-some-peli...

                                                                                                                                                                        I used this new plugin for LLM: https://github.com/simonw/llm-openai-via-codex

                                                                                                                                                                        UPDATE: I got a much better pelican by setting the reasoning effort to xhigh: https://gist.github.com/simonw/a6168e4165a258e4d664aeae8e602...

                                                                                                                                                                          • stingraycharles

                                                                                                                                                                            today at 2:03 AM

                                                                                                                                                                            OpenAI hired the guy behind OpenClaw, so it makes sense that they’re more lenient towards its usage.

                                                                                                                                                                              • thierrydamiba

                                                                                                                                                                                today at 12:39 PM

                                                                                                                                                                                They basically bought OpenClaw right?

                                                                                                                                                                            • DrProtic

                                                                                                                                                                              yesterday at 7:28 PM

                                                                                                                                                                              That pelican you posted yesterday from a local model looks nicer than this one.

                                                                                                                                                                              Edit: this one has crossed legs lol

                                                                                                                                                                                • BeetleB

                                                                                                                                                                                  yesterday at 7:38 PM

                                                                                                                                                                                  It really needs to pee.

                                                                                                                                                                              • GistNoesis

                                                                                                                                                                                yesterday at 8:34 PM

                                                                                                                                                                                Isn't it awful ? After 5.5 versions it still can't draw a basic bike frame. How is the front wheel supposed to turn sideways ?

                                                                                                                                                                                  • jetrink

                                                                                                                                                                                    yesterday at 8:52 PM

                                                                                                                                                                                    I feel like if I attempted this, the bike frame would look fine and everything else would be completely unrecognizable. After all, a basic bike frame is just straight lines arranged in a fairly simple shape. It's really surprising that models find it so difficult, but they can make a pelican with panache.

                                                                                                                                                                                      • nlawalker

                                                                                                                                                                                        yesterday at 9:03 PM

                                                                                                                                                                                        > a fairly simple shape

                                                                                                                                                                                        Bike frames are very hard to draw unless you've already consciously internalized the basic shape, see https://www.booooooom.com/2016/05/09/bicycles-built-based-on...

                                                                                                                                                                                        • necubi

                                                                                                                                                                                          yesterday at 9:02 PM

                                                                                                                                                                                          Humans are also famously bad at drawing bicycles from memory https://www.gianlucagimini.it/portfolio-item/velocipedia/

                                                                                                                                                                                            • yesterday at 9:39 PM

                                                                                                                                                                                          • billywhizz

                                                                                                                                                                                            yesterday at 11:23 PM

                                                                                                                                                                                            why do you find it surprising? these models have no actual understanding of anything, never mind the physical properties and capabilities of a bicycle.

                                                                                                                                                                                              • rimliu

                                                                                                                                                                                                today at 7:07 AM

                                                                                                                                                                                                Sad to see this downvoted. So many people think that LLM have understanding?

                                                                                                                                                                                            • fragmede

                                                                                                                                                                                              yesterday at 9:02 PM

                                                                                                                                                                                              My question is, as a human, how well would you or I do under the same conditions? Which is to say, I could do a much better job in inkscape with Google images to back me up, but if I was blindly shitting vectors into an XML file that I can't render to see the results of, I'm not even going to get the triangles for the frame to line up, so this pelican is very impressive!

                                                                                                                                                                                          • simonw

                                                                                                                                                                                            yesterday at 8:39 PM

                                                                                                                                                                                            Yeah, the bike frame is the thing I always look at first - it's still reasonably rare for a model to draw that correctly, although Qwen 3.6 and Gemini Pro 3.1 do that well now.

                                                                                                                                                                                            • loa_in_

                                                                                                                                                                                              yesterday at 8:51 PM

                                                                                                                                                                                              The distinction is that it's not drawing. It's generating an SVG document containing descriptors of the shapes.

                                                                                                                                                                                          • zerop

                                                                                                                                                                                            today at 8:57 AM

                                                                                                                                                                                            So pelican must have become the mandatory test case to pass for all model providers before launch.

                                                                                                                                                                                            • matt3210

                                                                                                                                                                                              today at 4:17 AM

                                                                                                                                                                                              The pelican doesn’t really matter anymore since models are tuned for it knowing people will ask.

                                                                                                                                                                                                • simonw

                                                                                                                                                                                                  today at 5:07 AM

                                                                                                                                                                                                  They suck at tuning for it.

                                                                                                                                                                                              • postalcoder

                                                                                                                                                                                                yesterday at 7:42 PM

                                                                                                                                                                                                I made pelicans at different thinking efforts:

                                                                                                                                                                                                https://hcker.news/pelican-low.svg

                                                                                                                                                                                                https://hcker.news/pelican-medium.svg

                                                                                                                                                                                                https://hcker.news/pelican-high.svg

                                                                                                                                                                                                https://hcker.news/pelican-xhigh.svg

                                                                                                                                                                                                Someone needs to make a pelican arena, I have no idea if these are considered good or not.

                                                                                                                                                                                                  • deflator

                                                                                                                                                                                                    yesterday at 7:46 PM

                                                                                                                                                                                                    They are not good, and they seem to get worse as you increased effort. Weird

                                                                                                                                                                                                      • postalcoder

                                                                                                                                                                                                        yesterday at 7:51 PM

                                                                                                                                                                                                        Yeah. I've always loosely correlated pelican quality with big model smell but I'm not picking that up here. I thought this was supposed to be spud? Weird indeed.

                                                                                                                                                                                                        • throw310822

                                                                                                                                                                                                          yesterday at 7:58 PM

                                                                                                                                                                                                          No but I can sense the movement, I think it's already reached the level of intelligence that draws it towards futurism or cubism /s

                                                                                                                                                                                                      • seanw444

                                                                                                                                                                                                        yesterday at 7:58 PM

                                                                                                                                                                                                        Can someone explain how we arrived at the pelican test? Was there some actual theory behind why it's difficult to produce? Or did someone just think it up, discover it was consistently difficult, and now we just all know it's a good test?

                                                                                                                                                                                                          • simonw

                                                                                                                                                                                                            yesterday at 8:13 PM

                                                                                                                                                                                                            I set it up as a joke, to make fun of all of the other benchmarks. To my surprise it ended up being a surprisingly good measure of the quality of the model for other tasks (up to a certain point at least), though I've never seen a convincing argument as to why.

                                                                                                                                                                                                            I gave a talk about it last year: https://simonwillison.net/2025/Jun/6/six-months-in-llms/

                                                                                                                                                                                                            It should not be treated as a serious benchmark.

                                                                                                                                                                                                              • jimbokun

                                                                                                                                                                                                                yesterday at 8:43 PM

                                                                                                                                                                                                                What it has going for it is human interpretability.

                                                                                                                                                                                                                Anyone can look and decide if it’s a good picture or not. But the numeric benchmarks don’t tell you much if you aren’t already familiar with that benchmark and how it’s constructed.

                                                                                                                                                                                                            • redox99

                                                                                                                                                                                                              yesterday at 8:05 PM

                                                                                                                                                                                                              It all began with a Microsoft researcher showing a unicorn drawn in tikz using GPT4. It was an example of something so outrageous that there was no way it existed in the training data. And that's back when models were not multimodal.

                                                                                                                                                                                                              Nowadays I think it's pretty silly, because there's surely SVG drawing training data and some effort from the researchers put onto this task. It's not a showcase of emergent properties.

                                                                                                                                                                                                              • CamperBob2

                                                                                                                                                                                                                yesterday at 8:06 PM

                                                                                                                                                                                                                It's interesting to see some semblance of spatial reasoning emerge from systems based on textual tokens. Could be seen as a potential proxy for other desirable traits.

                                                                                                                                                                                                                It's meta-interesting that few if any models actually seem to be training on it. Same with other stereotypical challenges like the car-wash question, which is still sometimes failed by high-end models.

                                                                                                                                                                                                                If I ran an AI lab, I'd take it as a personal affront if my model emitted a malformed pelican or advised walking to a car wash. Heads would roll.

                                                                                                                                                                                                            • bravoetch

                                                                                                                                                                                                              yesterday at 8:36 PM

                                                                                                                                                                                                              I tried getting it to generate openscad models, which seems much harder. Not had much joy yet with results.

                                                                                                                                                                                                                • a96

                                                                                                                                                                                                                  today at 7:54 AM

                                                                                                                                                                                                                  G code and ascii art are also text formats, but seem to be beyond most if not all models.

                                                                                                                                                                                                                  (There are some that generate 3d models specifically, more in the image generation family than chatbot family.)

                                                                                                                                                                                                              • lexarflash8g

                                                                                                                                                                                                                yesterday at 11:02 PM

                                                                                                                                                                                                                None of them have the pelican's feet placed properly on the pedals -- or the pedals are misrepresented. Cool art style but not physically accurate.

                                                                                                                                                                                                                  • a96

                                                                                                                                                                                                                    today at 7:52 AM

                                                                                                                                                                                                                    I'm not sure a physically accurate pelican would reach two pedals on a common bicycle. Maybe a model can solve that problem one day.

                                                                                                                                                                                                                • lostmsu

                                                                                                                                                                                                                  today at 12:40 AM

                                                                                                                                                                                                                  https://pelicans.borg.games/

                                                                                                                                                                                                              • droidjj

                                                                                                                                                                                                                yesterday at 7:31 PM

                                                                                                                                                                                                                It's... like no pelican I've ever seen before.

                                                                                                                                                                                                                  • hagbard_c

                                                                                                                                                                                                                    yesterday at 11:37 PM

                                                                                                                                                                                                                    You've never seen pelicans riding bicycles either so maybe these are just representations of those specific subgroups of pelicans which are capable of riding them. Normal pelicans would not feel the need to ride bikes since they can fly, these special pelicans mostly seem to lack the equipment needed to do that which might be part of the reason they evolved to ride two-wheeled pedal-propelled vehicles.

                                                                                                                                                                                                                • XCSme

                                                                                                                                                                                                                  yesterday at 7:38 PM

                                                                                                                                                                                                                  Is this direct API usage allowed by their terms? I remember Anthropic really not liking such usage.

                                                                                                                                                                                                                • today at 2:29 AM

                                                                                                                                                                                                                  • Schlagbohrer

                                                                                                                                                                                                                    yesterday at 9:52 PM

                                                                                                                                                                                                                    That's amazing that the default did that much in just 39 "reasoning tokens" (no idea what a reasoning token is but that's still shockingly few tokens)

                                                                                                                                                                                                                      • erdaniels

                                                                                                                                                                                                                        yesterday at 10:11 PM

                                                                                                                                                                                                                        If you don't know what a reasoning token is, then how can 39 be considered shockingly few?

                                                                                                                                                                                                                          • Culonavirus

                                                                                                                                                                                                                            today at 12:08 AM

                                                                                                                                                                                                                            It's less than 67, duh.

                                                                                                                                                                                                                              • tclancy

                                                                                                                                                                                                                                today at 1:38 AM

                                                                                                                                                                                                                                Not during peak hours.

                                                                                                                                                                                                                    • deflator

                                                                                                                                                                                                                      yesterday at 7:42 PM

                                                                                                                                                                                                                      Hmm. Any idea why it's so much worse than the other ones you have posted lately? Even the open weight local models were much better, like the Qwen one you posted yesterday.

                                                                                                                                                                                                                        • simonw

                                                                                                                                                                                                                          yesterday at 8:11 PM

                                                                                                                                                                                                                          The xhigh one was better, but clearly OpenAI have not been focusing their training efforts on SVG illustrations of animals riding modes of transport!

                                                                                                                                                                                                                          • yesterday at 7:57 PM

                                                                                                                                                                                                                            • irthomasthomas

                                                                                                                                                                                                                              yesterday at 8:18 PM

                                                                                                                                                                                                                              It beats opus-4.7 but looks like open models actually have the lead here.

                                                                                                                                                                                                                          • noonething

                                                                                                                                                                                                                            yesterday at 9:48 PM

                                                                                                                                                                                                                            Thank you for doing all this. It's appreciated.

                                                                                                                                                                                                                              • i_love_retros

                                                                                                                                                                                                                                today at 2:00 AM

                                                                                                                                                                                                                                You do realise they are doing it for self promotion right?

                                                                                                                                                                                                                                  • simonw

                                                                                                                                                                                                                                    today at 2:33 AM

                                                                                                                                                                                                                                    I mean, yeah. "Person who spends time publishing content online is doing it for self promotion" doesn't seem particularly notable to me. 24 years of self promotion and counting!

                                                                                                                                                                                                                                      • i_love_retros

                                                                                                                                                                                                                                        today at 12:28 PM

                                                                                                                                                                                                                                        Dude it comes across, maybe only to me, as a bit shameless. Or maybe it's just that there are so many people lapping it up like you're doing a public service that I find tedious. I wish hackernews had a block feature but alas it doesn't. Maybe I'll vibecode a browser extension.

                                                                                                                                                                                                                                        • fc417fc802

                                                                                                                                                                                                                                          today at 7:10 AM

                                                                                                                                                                                                                                          I am always outraged when youtube creators ask me to like and subscribe. /s

                                                                                                                                                                                                                              • singingtoday

                                                                                                                                                                                                                                today at 1:00 AM

                                                                                                                                                                                                                                Thank you for continuing to post these! Very interesting benchmark.

                                                                                                                                                                                                                                • gpm

                                                                                                                                                                                                                                  yesterday at 8:27 PM

                                                                                                                                                                                                                                  I for one delight in bicycles where neither wheel can turn!

                                                                                                                                                                                                                                  It continues to amaze me that these models that definitely know what bicycle geometry actually looks like somewhere in their weights produces such implausibly bad geometry.

                                                                                                                                                                                                                                  Also mildly interesting, and generally consistent with my experience with LLMs, that it produced the same obvious geometry issue both times.

                                                                                                                                                                                                                                    • lxgr

                                                                                                                                                                                                                                      yesterday at 8:58 PM

                                                                                                                                                                                                                                      > It continues to amaze me that these models that definitely know what bicycle geometry actually looks like somewhere in their weights produces such implausibly bad geometry.

                                                                                                                                                                                                                                      I feel like the main problem for the models is that they can't actually look at the visual output produced by their SVG and iterate. I'm almost willing to bet that if they could, they'd absolutely nail it at this point.

                                                                                                                                                                                                                                      Imagine designing an SVG yourself without being able to ever look outside the XML editor!

                                                                                                                                                                                                                                        • gpm

                                                                                                                                                                                                                                          yesterday at 9:03 PM

                                                                                                                                                                                                                                          > Imagine designing an SVG yourself without being able to ever look outside the XML editor!

                                                                                                                                                                                                                                          I honestly think I could do much better on the bicycle without looking at the output (with some assistance for SVG syntax which I definitely don't know), just as someone who rides them and generally knows what the parts are.

                                                                                                                                                                                                                                          I'd do worse at the pelicans though.

                                                                                                                                                                                                                                  • andriy_koval

                                                                                                                                                                                                                                    yesterday at 7:41 PM

                                                                                                                                                                                                                                    what is your setup for drawing pelican? Do you ask model to check generated image, find issues and iterate over it which would demonstrate models real abilities?

                                                                                                                                                                                                                                      • simonw

                                                                                                                                                                                                                                        yesterday at 8:12 PM

                                                                                                                                                                                                                                        It's generally one-shot-only - whatever comes out the first time is what I go with.

                                                                                                                                                                                                                                        I've been contemplating a more fair version where each model gets 3-5 attempts and then can select which rendered image is "best".

                                                                                                                                                                                                                                          • irthomasthomas

                                                                                                                                                                                                                                            yesterday at 8:19 PM

                                                                                                                                                                                                                                            Try llm-consortium with --judging-method rank

                                                                                                                                                                                                                                            • andriy_koval

                                                                                                                                                                                                                                              yesterday at 8:14 PM

                                                                                                                                                                                                                                              I think it will make results way better and more representative of model abilities..

                                                                                                                                                                                                                                                • simonw

                                                                                                                                                                                                                                                  yesterday at 8:16 PM

                                                                                                                                                                                                                                                  It would... but the test is inherently silly, so I'm still not sure if it's worth me investing that extra effort in it.

                                                                                                                                                                                                                                      • SkyBelow

                                                                                                                                                                                                                                        yesterday at 8:15 PM

                                                                                                                                                                                                                                        Wait, I thought we were onto racoons on e-scooters to avoid (some of) the issues with Goodhart's Law coming into play.

                                                                                                                                                                                                                                          • simonw

                                                                                                                                                                                                                                            yesterday at 8:22 PM

                                                                                                                                                                                                                                            I fall back to possums on e-scooters if the pelican looks too good to be true. These aren't good enough for me to suspect any fowl play.

                                                                                                                                                                                                                                        • rolymath

                                                                                                                                                                                                                                          yesterday at 8:17 PM

                                                                                                                                                                                                                                          Exciting. Another Pelican post.

                                                                                                                                                                                                                                            • simonw

                                                                                                                                                                                                                                              yesterday at 8:41 PM

                                                                                                                                                                                                                                              See if you can spot what's interesting and unique about this one. I've been trying to put more than just a pelican in there, partly as a nod to people who are getting bored of them.

                                                                                                                                                                                                                                              • refulgentis

                                                                                                                                                                                                                                                yesterday at 8:33 PM

                                                                                                                                                                                                                                                It's silly and a joke and a surprisingly good benchmark and don't take it seriously but don't take not taking it seriously seriously and if it's too good we use another prompt and there's obvious ways to better it and it's not worth doing because it's not serious and if you say anything at all about the thread it's off-topic so you're doing exactly what you're complaining about and it's a personal attack from the fun police.

                                                                                                                                                                                                                                                Only coherent move at this point: hit the minus button immediately. There's never anything about the model in the thread other than simon's post.

                                                                                                                                                                                                                                            • dakolli

                                                                                                                                                                                                                                              yesterday at 8:23 PM

                                                                                                                                                                                                                                              You know they are 1000% training these models to draw pelicans, this hasn't been a valid benchmark for 6 months +

                                                                                                                                                                                                                                                • simonw

                                                                                                                                                                                                                                                  yesterday at 8:41 PM

                                                                                                                                                                                                                                                  OpenAI must be very bad at training models to draw pelicans (and bicycles) then.

                                                                                                                                                                                                                                                  • Legend2440

                                                                                                                                                                                                                                                    yesterday at 8:59 PM

                                                                                                                                                                                                                                                    Skeptism is out of control these days, any time an LLM does something cool it must have been cheating.

                                                                                                                                                                                                                                                • sjdv1982

                                                                                                                                                                                                                                                  yesterday at 8:15 PM

                                                                                                                                                                                                                                                  At some point, OpenAI is going to cheat and hardcode a pelican on a bicycle into the model. 3D modelling has Suzanne and the teapot; LLMs will have the pelican.

                                                                                                                                                                                                                                              • jfkimmes

                                                                                                                                                                                                                                                yesterday at 6:55 PM

                                                                                                                                                                                                                                                Everyone talked about the marketing stunt that was Anthropic's gated Mythos model with an 83% result on CyberGym. OpenAI just dropped GPT 5.5, which scores 82% and is open for anybody to use.

                                                                                                                                                                                                                                                I recommend anybody in offensive/defensive cybersecurity to experiment with this. This is the real data point we needed - without the hype!

                                                                                                                                                                                                                                                Never thought I'd say this but OpenAI is the 'open' option again.

                                                                                                                                                                                                                                                  • tpurves

                                                                                                                                                                                                                                                    yesterday at 7:41 PM

                                                                                                                                                                                                                                                    The real 'hype' was that the oh-snap realization that Open AI would absolutely release a competitive model to Mythos within weeks of Anthropic announcing there's, and that Sam would not gate access to it. So the panic was that the cyber world had only a projected 2 weeks to harden all these new zero days before Sam would inevitably create open season for blackhats to discover and exploit a deluge of zero-days.

                                                                                                                                                                                                                                                      • greenavocado

                                                                                                                                                                                                                                                        today at 1:45 AM

                                                                                                                                                                                                                                                        The GPT-5.5 API endpoint started to block me after I escalated with ever more aggressive use of rizin, radare2, and ghidra to confirm correct memory management and cleanup in error code branches when working with a buggy proprietary 3rd party SDK. After I explained myself more clearly it let me carry on. Knock on wood.

                                                                                                                                                                                                                                                        So there is a safety model watching your behavior for these kinds of things.

                                                                                                                                                                                                                                                          • fc417fc802

                                                                                                                                                                                                                                                            today at 7:15 AM

                                                                                                                                                                                                                                                            So you're saying that blackhats will be required to do a small bit of roleplay if they want the model to assist them? I'm not against public access BTW just pointing out how absurd that PR oriented "safety" feature is. "We did something don't blame us" sort of measure.

                                                                                                                                                                                                                                                            It isn't even my intent to naysay their approach. They probably have to do something along those lines to avoid being convicted in the court of public opinion. I just think it's an absurd reality.

                                                                                                                                                                                                                                                              • greenavocado

                                                                                                                                                                                                                                                                today at 12:11 PM

                                                                                                                                                                                                                                                                It's a liability shield and helps to avoid unsavory headlines in the news

                                                                                                                                                                                                                                                        • snthpy

                                                                                                                                                                                                                                                          today at 6:29 AM

                                                                                                                                                                                                                                                          Does that mean that we're likely to see Mythos released soon?

                                                                                                                                                                                                                                                          • Salgat

                                                                                                                                                                                                                                                            today at 12:16 AM

                                                                                                                                                                                                                                                            It's almost embarrassing how susceptible we are to these marketing campaigns.

                                                                                                                                                                                                                                                              • y-curious

                                                                                                                                                                                                                                                                today at 4:14 AM

                                                                                                                                                                                                                                                                Dunno about you, but I didn’t fall for it. I’m reminded of how they were ā€œafraidā€ to release GPT-2 because of the ā€œpowerā€ it had. Hype train!

                                                                                                                                                                                                                                                                • esjeon

                                                                                                                                                                                                                                                                  today at 12:54 AM

                                                                                                                                                                                                                                                                  Lack of information, lack of knowledge.

                                                                                                                                                                                                                                                                  The ā€œAIā€ ā€œtechnologyā€ is an easy excuse to create artificial information gap in the era of the interconnected.

                                                                                                                                                                                                                                                              • yesterday at 8:12 PM

                                                                                                                                                                                                                                                            • concinds

                                                                                                                                                                                                                                                              yesterday at 8:49 PM

                                                                                                                                                                                                                                                              > Never thought I'd say this but OpenAI is the 'open' option again.

                                                                                                                                                                                                                                                              Compared to Anthropic, they always have been. Anthropic has never released any open models. Never released Claude Code's source, willingly (unlike Codex). Never released their tokenizer.

                                                                                                                                                                                                                                                                • jwr

                                                                                                                                                                                                                                                                  today at 7:27 AM

                                                                                                                                                                                                                                                                  What's "open" about any of these companies?

                                                                                                                                                                                                                                                                  I'm tired of words being misused. We have hoverboards that do not hover, self-driving cars that do not, actually, self-drive, starships that will never fly to the stars, and "open"… I can't even describe what it's used for, except everybody wants to call themselves "open".

                                                                                                                                                                                                                                                                    • today at 8:06 AM

                                                                                                                                                                                                                                                              • unsupp0rted

                                                                                                                                                                                                                                                                yesterday at 9:55 PM

                                                                                                                                                                                                                                                                Doesn't OpenAI get mad if you ask cybersecurity questions and force you to upload a government ID, otherwise they'll silently route you to a less capable model?

                                                                                                                                                                                                                                                                > Developers and security professionals doing cybersecurity-related work or similar activity that could be mistaken by automated detection systems may have requests rerouted to GPT-5.2 as a fallback.

                                                                                                                                                                                                                                                                https://developers.openai.com/codex/concepts/cyber-safety

                                                                                                                                                                                                                                                                https://chatgpt.com/cyber

                                                                                                                                                                                                                                                                  • Mario9382

                                                                                                                                                                                                                                                                    today at 7:13 AM

                                                                                                                                                                                                                                                                    I don't like this trend, but I get why they require it. The alternative seems to just ban cybersecurity-related questions.

                                                                                                                                                                                                                                                                    • merlindru

                                                                                                                                                                                                                                                                      today at 12:42 AM

                                                                                                                                                                                                                                                                      Anthropic has started to ask for IDs for use of their products period

                                                                                                                                                                                                                                                                      I don't like that trend. I get why they're doing it, but I don't like it

                                                                                                                                                                                                                                                                        • brigandish

                                                                                                                                                                                                                                                                          today at 2:44 AM

                                                                                                                                                                                                                                                                          Are you in the UK? I've not had this happen to me (I'm not in the UK) so I'm wondering if the Online Safety Act has affected this, as it has with other products.

                                                                                                                                                                                                                                                                            • litigator

                                                                                                                                                                                                                                                                              today at 3:00 AM

                                                                                                                                                                                                                                                                              I am from the UK and have not had this happen to me (Yet? perhaps)

                                                                                                                                                                                                                                                                      • deaux

                                                                                                                                                                                                                                                                        yesterday at 10:36 PM

                                                                                                                                                                                                                                                                        They flatout gate any API access of the main models behind Persona ID verification. Entirely.

                                                                                                                                                                                                                                                                    • mafriese

                                                                                                                                                                                                                                                                      today at 6:56 AM

                                                                                                                                                                                                                                                                      From my experience OpenAI has become very sensitive when it comes to using their tools for security research. I am using MCP servers for tools like IDA Pro or Ghidra (for malware analysis) and recently received a warning:

                                                                                                                                                                                                                                                                      > OpenAI's terms and policies restrict the use of our services in a number of areas. We have identified activity in your OpenAI account that is not permitted under our policies for: - Cyber Abuse

                                                                                                                                                                                                                                                                      I raised an appeal which got denied. To be fair I think it's close to impossible for someone that is looking at the chat history to differenciate between legitimate research and malicious intent. I have also applied for the security research program that OpenAI is offering but didn't get any reply on that.

                                                                                                                                                                                                                                                                      • attentive

                                                                                                                                                                                                                                                                        today at 8:16 AM

                                                                                                                                                                                                                                                                        it's still somewhat gated behind "trusted access" for cyber, see https://chatgpt.com/cyber

                                                                                                                                                                                                                                                                        • tnkuehne

                                                                                                                                                                                                                                                                          yesterday at 6:57 PM

                                                                                                                                                                                                                                                                          isnt it like cyber question are being routed to dumper models at openai?

                                                                                                                                                                                                                                                                            • jfkimmes

                                                                                                                                                                                                                                                                              yesterday at 7:03 PM

                                                                                                                                                                                                                                                                              Do you have a source for that?

                                                                                                                                                                                                                                                                              Neither the release post, nor the model card seems to indicate anything like this?

                                                                                                                                                                                                                                                                                • tech234a

                                                                                                                                                                                                                                                                                  yesterday at 7:52 PM

                                                                                                                                                                                                                                                                                  I see it here https://developers.openai.com/codex/concepts/cyber-safety

                                                                                                                                                                                                                                                                                  • nikanj

                                                                                                                                                                                                                                                                                    yesterday at 7:41 PM

                                                                                                                                                                                                                                                                                    Anything that even vaguely smells like security research, reverse engineering or similar "dual-use" application hits the guardrails hard and fast. "Hey codex, here is our codebase, help us find exploitable issues" gives a "I can't help you with that, but I'm happy to give you a vague lecture on memory safety or craft a valgrind test harness"

                                                                                                                                                                                                                                                                            • willsmith72

                                                                                                                                                                                                                                                                              today at 3:24 AM

                                                                                                                                                                                                                                                                              Being "more" open than something totally closed doesn't make you open. The name is still bs

                                                                                                                                                                                                                                                                              • ur-whale

                                                                                                                                                                                                                                                                                yesterday at 8:01 PM

                                                                                                                                                                                                                                                                                > Anthropic's gated Mythos model

                                                                                                                                                                                                                                                                                aka the perfect marketing ploy

                                                                                                                                                                                                                                                                                  • xtracto

                                                                                                                                                                                                                                                                                    yesterday at 11:38 PM

                                                                                                                                                                                                                                                                                    Reminds me of Gmail's early invite only mode.

                                                                                                                                                                                                                                                                                • yesterday at 9:49 PM

                                                                                                                                                                                                                                                                                  • _the_inflator

                                                                                                                                                                                                                                                                                    yesterday at 11:45 PM

                                                                                                                                                                                                                                                                                    I ignore any hype news.

                                                                                                                                                                                                                                                                                    Anthropic is the embodiment of bullshitting to me.

                                                                                                                                                                                                                                                                                    I read Cialdini many decades ago and I am bored by Anthropic.

                                                                                                                                                                                                                                                                                    OpenAI is very clever. With the advent of Claude OpenAI disappeared from the headlines. Who or what was this Sam again all were talking about a year ago?

                                                                                                                                                                                                                                                                                    OpenAI has a massive user advantage so that they can simply follow Anthropic’s release cycle to ridicule them.

                                                                                                                                                                                                                                                                                    I think it is really brutal for Anthropic how they are easily getting passed by by OpenAI and it is getting worse with every new GPT version for Anthropic.

                                                                                                                                                                                                                                                                                    OpenAI owns them.

                                                                                                                                                                                                                                                                                      • thinkthatover

                                                                                                                                                                                                                                                                                        today at 12:23 AM

                                                                                                                                                                                                                                                                                        Who's Sam again? oh that person whose house was molotov'd last week? Or the person who had an expose written in the new yorker calling him a sociopath? I forget.

                                                                                                                                                                                                                                                                                • Someone1234

                                                                                                                                                                                                                                                                                  yesterday at 6:30 PM

                                                                                                                                                                                                                                                                                  I'd like to draw people's attention to this section of this page:

                                                                                                                                                                                                                                                                                  https://developers.openai.com/codex/pricing?codex-usage-limi...

                                                                                                                                                                                                                                                                                  Note the Local Messages between 5.3, 5.4, and 5.5. And, yes, I did read the linked article and know they're claiming that 5.5's new efficient should make it break-even with 5.4, but the point stands, tighter limits/higher prices.

                                                                                                                                                                                                                                                                                    • puppystench

                                                                                                                                                                                                                                                                                      yesterday at 7:03 PM

                                                                                                                                                                                                                                                                                      For API usage, GPT-5.5 is 2x the price of GPT-5.4, ~4x the price of GPT-5.1, and ~10x the price of Kimi-2.6.

                                                                                                                                                                                                                                                                                      Unfortunately I think the lesson they took from Anthropic is that devs get really reliant and even addicted on coding agents, and they'll happily pay any amount for even small benefits.

                                                                                                                                                                                                                                                                                        • kingstnap

                                                                                                                                                                                                                                                                                          yesterday at 7:18 PM

                                                                                                                                                                                                                                                                                          I feel like devs generally spend someone else's money on tokens. Either their employers or OpenAIs when they use a codex subscription.

                                                                                                                                                                                                                                                                                          If I put on my schizo hat. Something they might be doing is increasing the losses on their monthly codex subscriptions, to show that the API has a higher margin than before (the codex account massively in the negative, but the API account now having huge margins).

                                                                                                                                                                                                                                                                                          I've never seen an OpenAI investor pitch deck. But my guess is that API margins is one of the big ones they try to sell people on since Sama talks about it on Twitter.

                                                                                                                                                                                                                                                                                          I would be interested in hearing the insider stuff. Like if this model is genuinely like twice as expensive to serve or something.

                                                                                                                                                                                                                                                                                            • vineyardmike

                                                                                                                                                                                                                                                                                              yesterday at 8:52 PM

                                                                                                                                                                                                                                                                                              You can't build a business on per-seat subscriptions when you advertise making workers obsolete. API pricing with sustainable margins are the only way forward if you genuinely think you're going to cause (or accelerate) reduction in clients' headcount.

                                                                                                                                                                                                                                                                                              Additionally, the value generated by the best models with high-thinking and lots of context window is way higher than the cheap and tiny models, so you need to provide a "gateway drug" that lets people experience the best you offer.

                                                                                                                                                                                                                                                                                                • CryptoBanker

                                                                                                                                                                                                                                                                                                  today at 4:29 AM

                                                                                                                                                                                                                                                                                                  > You can't build a business on per-seat subscriptions when you advertise making workers obsolete.

                                                                                                                                                                                                                                                                                                  On the other hand I would argue that most workers' salaries are more like subscriptions than API type pricing (which would be more like an hourly contractor)

                                                                                                                                                                                                                                                                                              • ewrs

                                                                                                                                                                                                                                                                                                yesterday at 7:21 PM

                                                                                                                                                                                                                                                                                                Yeah and the increase in operating expenses is going to make managers start asking hard questions - this is good. It means eventually there will be budgets put in place - this will force OAI and Anthropic to innovate harder. Then we will see how things pan out. Ultimately a firm is not going to pay rent to these firms if the benefits dont exceed the costs.

                                                                                                                                                                                                                                                                                                  • mrwaffle

                                                                                                                                                                                                                                                                                                    today at 4:22 AM

                                                                                                                                                                                                                                                                                                    Meaning that you believe they're not trying their "hardest" to innovate? They must be slacking then.

                                                                                                                                                                                                                                                                                                    • girvo

                                                                                                                                                                                                                                                                                                      yesterday at 9:09 PM

                                                                                                                                                                                                                                                                                                      Budgets are already happening

                                                                                                                                                                                                                                                                                                      • dist-epoch

                                                                                                                                                                                                                                                                                                        yesterday at 8:12 PM

                                                                                                                                                                                                                                                                                                        > Ultimately a firm is not going to pay rent to these firms if the benefits dont exceed the costs.

                                                                                                                                                                                                                                                                                                        This is also true for the humans. They will need to provide more benefits than the coding agents cost.

                                                                                                                                                                                                                                                                                                          • eiksjs

                                                                                                                                                                                                                                                                                                            yesterday at 8:35 PM

                                                                                                                                                                                                                                                                                                            Humans are needed to use agents and these agents are not showing to be fully autonomous and require constant human review. In fact all you are getting is a splurge of stuff, people not thinking deeper anymore and the creation of more bottle necks and exacerbating the ones that already exist in an org.

                                                                                                                                                                                                                                                                                                            You sound like elon with the fsd will be here next year. Many cars have the self driving feature - most drivers don’t use it. Oh why is that I wonder.

                                                                                                                                                                                                                                                                                                    • mitjam

                                                                                                                                                                                                                                                                                                      yesterday at 8:04 PM

                                                                                                                                                                                                                                                                                                      The difference between sub and api price makes it hard to create competitive solutions on the app level.

                                                                                                                                                                                                                                                                                                        • irthomasthomas

                                                                                                                                                                                                                                                                                                          yesterday at 8:31 PM

                                                                                                                                                                                                                                                                                                          This was something I worried about after openai started building apps as well as models. Now all of the labs make no secret of the fact that they are going after the whole software industry. Its going to be hard to maintain functioning fair markets unless governments step in.

                                                                                                                                                                                                                                                                                                  • w10-1

                                                                                                                                                                                                                                                                                                    yesterday at 9:11 PM

                                                                                                                                                                                                                                                                                                    Price increases now aim to demonstrate market power for eventual IPO.

                                                                                                                                                                                                                                                                                                    If they can show that people will pay a lot for somewhat better performance, it raises the value of any performance lead they can maintain.

                                                                                                                                                                                                                                                                                                    If they demonstrate that and high switching costs, their franchise is worth scary amounts of money.

                                                                                                                                                                                                                                                                                                    • JohnLocke4

                                                                                                                                                                                                                                                                                                      yesterday at 7:27 PM

                                                                                                                                                                                                                                                                                                      Sometimes I wonder if innovation in the AI space has stalled and recent progress is just a product of increased compute. Competence is increasing exponentially[1] but I guess it doesn't rule it out completely. I would postulate that a radical architecture shift is needed for the singularity though

                                                                                                                                                                                                                                                                                                      [1]https://arxiv.org/html/2503.14499v1 *Source is from March 2025 so make of it what you will.

                                                                                                                                                                                                                                                                                                        • scotty79

                                                                                                                                                                                                                                                                                                          today at 12:37 PM

                                                                                                                                                                                                                                                                                                          We are constantly getting smaller and faster models that are close in performance to state of the art from few months prior. And that's due to architectural inventions. I'm sure it takes some time for these inventions to proliferate to frontier and that some might not be applicable there but we are definitely going faster than just due to compute increase.

                                                                                                                                                                                                                                                                                                          It will get faster, but there are no singularities in the real world. Except possibly black holes, but we can't even be sure of that.

                                                                                                                                                                                                                                                                                                          • nomel

                                                                                                                                                                                                                                                                                                            yesterday at 7:34 PM

                                                                                                                                                                                                                                                                                                            > that devs get really reliant and even addicted on coding agents

                                                                                                                                                                                                                                                                                                            An alternative perspective is, devs highly value coding agents, and are willing to pay more because they're so useful. In other words, the market value of this limited resource is being adjusted to be closer to reality.

                                                                                                                                                                                                                                                                                                              • killingtime74

                                                                                                                                                                                                                                                                                                                yesterday at 8:52 PM

                                                                                                                                                                                                                                                                                                                It's not limited though there are alternative providers even now, much less when the price goes up. Chinese providers, European ones, local models.

                                                                                                                                                                                                                                                                                                                  • nomel

                                                                                                                                                                                                                                                                                                                    yesterday at 9:30 PM

                                                                                                                                                                                                                                                                                                                    > It's not limited though

                                                                                                                                                                                                                                                                                                                    Inference is not free, so all providers have a financial limit, and all providers have limited GPU/memory, so there's a physical material limit.

                                                                                                                                                                                                                                                                                                                    I suggest looking at the profits of these companies (while they scramble to stay competitive).

                                                                                                                                                                                                                                                                                                        • pxc

                                                                                                                                                                                                                                                                                                          yesterday at 7:28 PM

                                                                                                                                                                                                                                                                                                          Maybe that's true. But I think part of the issue is that for a lot of things developers want to do with them now— certainly for most of the things I want to do with them— they're either barely good enough, or not consistently good enough. And the value difference across that quality threshold is immense, even if the quality difference itself isn't.

                                                                                                                                                                                                                                                                                                          • pzo

                                                                                                                                                                                                                                                                                                            yesterday at 7:21 PM

                                                                                                                                                                                                                                                                                                            On top of that I noticed just right now after updating macos dekstop codex app, I got again by default set speed to 'fast' ('about 1.5x faster with increased plan usage'). They really want you to burn more tokens.

                                                                                                                                                                                                                                                                                                              • nubg

                                                                                                                                                                                                                                                                                                                yesterday at 10:53 PM

                                                                                                                                                                                                                                                                                                                wow wait so it wasn't just me leaving it on from an old session?

                                                                                                                                                                                                                                                                                                                sounds like criminal fraud to me tbh

                                                                                                                                                                                                                                                                                                            • 0xbadcafebee

                                                                                                                                                                                                                                                                                                              yesterday at 8:30 PM

                                                                                                                                                                                                                                                                                                              A fool and his money are soon parted

                                                                                                                                                                                                                                                                                                              • oh_no

                                                                                                                                                                                                                                                                                                                yesterday at 7:07 PM

                                                                                                                                                                                                                                                                                                                what's the source on that?

                                                                                                                                                                                                                                                                                                                  • puppystench

                                                                                                                                                                                                                                                                                                                    yesterday at 7:09 PM

                                                                                                                                                                                                                                                                                                                    In the announcement webpage:

                                                                                                                                                                                                                                                                                                                    >For API developers, gpt-5.5 will soon be available in the Responses and Chat Completions APIs at $5 per 1M input tokens and $30 per 1M output tokens, with a 1M context window.

                                                                                                                                                                                                                                                                                                                      • oh_no

                                                                                                                                                                                                                                                                                                                        yesterday at 7:19 PM

                                                                                                                                                                                                                                                                                                                        oops, thanks. i had just been looking at their api docs

                                                                                                                                                                                                                                                                                                                • Mars008

                                                                                                                                                                                                                                                                                                                  today at 12:54 AM

                                                                                                                                                                                                                                                                                                                  > devs get really reliant and even addicted on coding agents

                                                                                                                                                                                                                                                                                                                  That's more about managers who hope AI will gradually replace stubborn and lazy devs. That will shift the balance to business ideas and connections out of technical side and investments.

                                                                                                                                                                                                                                                                                                                  Anyway, before singularity there going to be a huge change.

                                                                                                                                                                                                                                                                                                                  • throwaway613746

                                                                                                                                                                                                                                                                                                                    yesterday at 8:25 PM

                                                                                                                                                                                                                                                                                                                    [dead]

                                                                                                                                                                                                                                                                                                                • keyle

                                                                                                                                                                                                                                                                                                                  today at 6:07 AM

                                                                                                                                                                                                                                                                                                                  I did one review job that sent off three subagents and I blew the second half of my daily limit in 10 mins 13 seconds. Fun times.

                                                                                                                                                                                                                                                                                                                  • raincole

                                                                                                                                                                                                                                                                                                                    today at 1:59 AM

                                                                                                                                                                                                                                                                                                                    It's such a vague table for pricing information. 30-150 messages...? What?

                                                                                                                                                                                                                                                                                                                • minimaxir

                                                                                                                                                                                                                                                                                                                  yesterday at 6:08 PM

                                                                                                                                                                                                                                                                                                                  The more interesting part of the announcement than "it's better at benchmarks":

                                                                                                                                                                                                                                                                                                                  > To better utilize GPUs, Codex analyzed weeks’ worth of production traffic patterns and wrote custom heuristic algorithms to optimally partition and balance work. The effort had an outsized impact, increasing token generation speeds by over 20%.

                                                                                                                                                                                                                                                                                                                  The ability for agentic LLMs to improve computational efficiency/speed is a highly impactful domain I wish was more tested than with benchmarks. From my experience Opus is still much better than GPT/Codex in this aspect, but given that OpenAI is getting material gains out of this type of performancemaxxing and they have an increasing incentive to continue doing so given cost/capacity issues, I wonder if OpenAI will continue optimizing for it.

                                                                                                                                                                                                                                                                                                                    • xiphias2

                                                                                                                                                                                                                                                                                                                      yesterday at 6:41 PM

                                                                                                                                                                                                                                                                                                                      There's already KernelBench which tests CUDA kernel optimizations.

                                                                                                                                                                                                                                                                                                                      On the other hand all companies know that optimizing their own infrastructure / models is the critical path for ,,winning'' against the competition, so you can bet they are serious about it.

                                                                                                                                                                                                                                                                                                                      • xtracto

                                                                                                                                                                                                                                                                                                                        yesterday at 11:43 PM

                                                                                                                                                                                                                                                                                                                        So, im working in some high performance data processing in Rust. I had hit some performance walls, and needed to improve in the 100x or more scale.

                                                                                                                                                                                                                                                                                                                        I remembered the famous FizzBuzz Intel codegolf optimizations, and gave it to gemini pro, along with my code and instructions to "suggest optimizations similar to those, maybe not so low level, but clever" and it's suggestions were veerry cool.

                                                                                                                                                                                                                                                                                                                        LLM do not stop amazing me every day.

                                                                                                                                                                                                                                                                                                                        • amrrs

                                                                                                                                                                                                                                                                                                                          yesterday at 6:12 PM

                                                                                                                                                                                                                                                                                                                          Honestly the problem with these is how empirical it is, how someone can reproduce this? I love when Labs go beyond traditional benchies like MMLU and friends but these kind of statements don't help much either - unless it's a proper controlled study!

                                                                                                                                                                                                                                                                                                                            • minimaxir

                                                                                                                                                                                                                                                                                                                              yesterday at 6:21 PM

                                                                                                                                                                                                                                                                                                                              In a sense it's better than a benchmark: it's a practical, real-world, highly quantifiable improvement assuming there are no quality regressions and passes all test cases. I have been experimenting with this workflow across a variety of computational domains and have achieved consistent results with both Opus and GPT. My coworkers have independently used Opus for optimization suggestions on services in prod and they've led to much better performance (3x in some cases).

                                                                                                                                                                                                                                                                                                                              A more empirical test would be good for everyone (i.e. on equal hardware, give each agent the goal to implement an algorithm and make it as fast as possible, then quantify relative speed improvements that pass all test cases).

                                                                                                                                                                                                                                                                                                                                • squibonpig

                                                                                                                                                                                                                                                                                                                                  yesterday at 8:12 PM

                                                                                                                                                                                                                                                                                                                                  Yeah but like what if they're sorta embellishing it or just lying? That's the issue with not being reproducible.

                                                                                                                                                                                                                                                                                                                              • jstanley

                                                                                                                                                                                                                                                                                                                                yesterday at 6:58 PM

                                                                                                                                                                                                                                                                                                                                Oh, come on, if they do well on benchmarks people question how applicable they are in reality. If they do well in reality people complain that it's not a reproducible benchmark...

                                                                                                                                                                                                                                                                                                                                  • girvo

                                                                                                                                                                                                                                                                                                                                    yesterday at 9:53 PM

                                                                                                                                                                                                                                                                                                                                    That's easily explained by those being two different people with two different opinions?

                                                                                                                                                                                                                                                                                                                                      • 2goomba1stage

                                                                                                                                                                                                                                                                                                                                        today at 2:01 AM

                                                                                                                                                                                                                                                                                                                                        And together they make one single community that s effectively NEVER happy.

                                                                                                                                                                                                                                                                                                                        • astlouis44

                                                                                                                                                                                                                                                                                                                          yesterday at 6:10 PM

                                                                                                                                                                                                                                                                                                                          A playable 3D dungeon arena prototype built with Codex and GPT models. Codex handled the game architecture, TypeScript/Three.js implementation, combat systems, enemy encounters, HUD feedback, and GPT‑generated environment textures. Character models, character textures, and animations were created with third-party asset-generation tools

                                                                                                                                                                                                                                                                                                                          The game that this prompt generated looks pretty decent visually. A big part of this likely due to the fact the meshes were created using a seperate tool (probably meshy, tripo.ai, or similiar) and not generated by 5.5 itself.

                                                                                                                                                                                                                                                                                                                          It really seems like we could be at the dawn of a new era similiar to flash, where any gamer or hobbyist can generate game concepts quickly and instantly publish them to the web. Three.js in particular is really picking up as the primary way to design games with AI, in spite of the fact it's not even a game engine, just a web rendering library.

                                                                                                                                                                                                                                                                                                                            • 0x62

                                                                                                                                                                                                                                                                                                                              yesterday at 6:26 PM

                                                                                                                                                                                                                                                                                                                              FWIW I've been experimenting with Three.js and AI for the last ~3 years, and noticed a significant improvement in 5.4 - the biggest single generation leap for Three.js specifically. It was most evident in shaders (GLSL), but also apparent in structuring of Three.js scenes across multiple pages/components.

                                                                                                                                                                                                                                                                                                                              It still struggles to create shaders from scratch, but is now pretty adequate at editing existing shaders.

                                                                                                                                                                                                                                                                                                                              In 5.2 and below, GPT really struggled with "one canvas, multiple page" experiences, where a single background canvas is kept rendered over routes. In 5.4, it still takes a bit of hand-holding and frequent refactor/optimisation prompts, but is a lot more capable.

                                                                                                                                                                                                                                                                                                                              Excited to test 5.5 and see how it is in practice.

                                                                                                                                                                                                                                                                                                                                • CSMastermind

                                                                                                                                                                                                                                                                                                                                  yesterday at 6:29 PM

                                                                                                                                                                                                                                                                                                                                  > It still struggles to create shaders from scratch

                                                                                                                                                                                                                                                                                                                                  Oh just like a real developer

                                                                                                                                                                                                                                                                                                                                    • accrual

                                                                                                                                                                                                                                                                                                                                      yesterday at 7:30 PM

                                                                                                                                                                                                                                                                                                                                      Much respect for shader developers, it's a different way of thinking/programming

                                                                                                                                                                                                                                                                                                                                  • Pym

                                                                                                                                                                                                                                                                                                                                    yesterday at 10:03 PM

                                                                                                                                                                                                                                                                                                                                    One struggle I'm having (with Claude) is that most of what it knows about Three.js is outdated. I haven't used GPT in a while, is the grass greener?

                                                                                                                                                                                                                                                                                                                                    Have you tried any skills like cloudai-x/threejs-skills that help with that? Or built your own?

                                                                                                                                                                                                                                                                                                                                    • import

                                                                                                                                                                                                                                                                                                                                      yesterday at 9:49 PM

                                                                                                                                                                                                                                                                                                                                      Using Claude for the same context and it’s doing really well with the glsl. since like last September

                                                                                                                                                                                                                                                                                                                                  • dataviz1000

                                                                                                                                                                                                                                                                                                                                    yesterday at 8:05 PM

                                                                                                                                                                                                                                                                                                                                    LLM models can not do spacial reasoning. I haven't tried with GPT, however, Claude can not solve a Rubik Cube no matter how much I try with prompt engineering. I got Opus 4.6 to get ~70% of the puzzle solved but it got stuck. At $20 a run it prohibitively expensive.

                                                                                                                                                                                                                                                                                                                                    The point is if we can prompt an LLM to reason about 3 dimensions, we likely will be able to apply that to math problems which it isn't able to solve currently.

                                                                                                                                                                                                                                                                                                                                    I should release my Rubiks Cube MCP server with the challenge to see if someone can write a prompt to solve a Rubik's Cube.

                                                                                                                                                                                                                                                                                                                                      • variodot

                                                                                                                                                                                                                                                                                                                                        today at 10:30 AM

                                                                                                                                                                                                                                                                                                                                        I’ve had a similar experience building a geometry/woodworking-flavored web app with Three.js and SVG rendering. It’s been kind of wild how quickly the SOTA models let me approach a new space in spatial development and rendering 3d (or SA optimization approaches, for that matter). That said, there are still easy "3d app" mistakes it makes like z-axis flipping or misreading coordinate conventions. But these models make similar mistakes with CSS and page awareness. Both require good verification loops to be effective.

                                                                                                                                                                                                                                                                                                                                          • dataviz1000

                                                                                                                                                                                                                                                                                                                                            today at 10:51 AM

                                                                                                                                                                                                                                                                                                                                            I think there is a pattern. It has a hard time with temporal and spatial.

                                                                                                                                                                                                                                                                                                                                            Temporal. I had a research project where the LLM had no concept about preventing data from the future to leak in. I eventually had to create a wall clock and an agent that would step through every line of code and ensure by writing that lines logic and why there is no future of the wall clock data leaking.

                                                                                                                                                                                                                                                                                                                                            Spatial. I created a canvas for rendering thinking model's attention and feedforward layers for data visualization animations. It was having a hard time working with it until I pointed Opus 4.7 to some ancient JavaScript code [0] about projecting 3d to 2d and after searching Github repositories. It worked perfect with pan zoom in one shot after that.

                                                                                                                                                                                                                                                                                                                                            No matter how hard I tried I couldn't get it to stack all the layers correctly. It must have remembered all the parts for projecting 3d to 2d because it could not figure out how to position the layers.

                                                                                                                                                                                                                                                                                                                                            There is a ton of information burnt into the weights during training but it can not reason about it. When it does work well with spatial and temporal it is more slight of hand than being able to generalize.

                                                                                                                                                                                                                                                                                                                                            People say, why not just do reinforcement learning? That can't generalize in the same way a LLM can. I'm thinking about doing the Rubik's Cube because if people can solve that it might open up solutions for working temporal and spatial problems.

                                                                                                                                                                                                                                                                                                                                            [0] https://jakesgordon.com/writing/javascript-racer-v1-straight...

                                                                                                                                                                                                                                                                                                                                        • embedding-shape

                                                                                                                                                                                                                                                                                                                                          yesterday at 9:41 PM

                                                                                                                                                                                                                                                                                                                                          > I should release my Rubiks Cube MCP server with the challenge to see if someone can write a prompt to solve a Rubik's Cube.

                                                                                                                                                                                                                                                                                                                                          Do it, I'm game! You nerdsniped me immediately and my brain went "That sounds easy, I'm sure I could do that in a night" so I'm surely not alone in being almost triggered by what you wrote. I bet I could even do it with a local model!

                                                                                                                                                                                                                                                                                                                                          • versteegen

                                                                                                                                                                                                                                                                                                                                            today at 4:43 AM

                                                                                                                                                                                                                                                                                                                                            Interesting (would like to hear more), but solving a Rubiks cube would appear to be a poor way to measure spatial understanding or reasoning. Ordinary human spatial intuition lets you think about how to move a tile to a certain location, but not really how to make consistent progress towards a solution; what's needed is knowledge of solution techniques. I'd say what you're measuring is 'perception' rather than reasoning.

                                                                                                                                                                                                                                                                                                                                              • William_BB

                                                                                                                                                                                                                                                                                                                                                today at 6:41 AM

                                                                                                                                                                                                                                                                                                                                                > what's needed is knowledge of solution techniques

                                                                                                                                                                                                                                                                                                                                                That's definitely in the training data

                                                                                                                                                                                                                                                                                                                                            • Melatonic

                                                                                                                                                                                                                                                                                                                                              yesterday at 11:01 PM

                                                                                                                                                                                                                                                                                                                                              What about a model designed for robotics and vision? Seems like an LLM trained on text would inherently not be great for this.

                                                                                                                                                                                                                                                                                                                                              DeepMinds other models however might do better?

                                                                                                                                                                                                                                                                                                                                              • holoduke

                                                                                                                                                                                                                                                                                                                                                today at 11:26 AM

                                                                                                                                                                                                                                                                                                                                                I bet I can even do it with the smallest gemma 4 model using a prompt of max 500 characters.

                                                                                                                                                                                                                                                                                                                                                • snet0

                                                                                                                                                                                                                                                                                                                                                  yesterday at 9:24 PM

                                                                                                                                                                                                                                                                                                                                                  How are you handing the cube state to the model?

                                                                                                                                                                                                                                                                                                                                                    • dataviz1000

                                                                                                                                                                                                                                                                                                                                                      yesterday at 9:45 PM

                                                                                                                                                                                                                                                                                                                                                      Does this answer the question?

                                                                                                                                                                                                                                                                                                                                                      Opus 4.6 got the cross and started to get several pieces on the correct faces. It couldn't reason past this. You can see the prompts and all the turn messages.

                                                                                                                                                                                                                                                                                                                                                      https://gist.github.com/adam-s/b343a6077dd2f647020ccacea4140...

                                                                                                                                                                                                                                                                                                                                                      edit: I can't reply to message below. The point isn't can we solve a Rubik's Cube with a python script and tool calls. The point is can we get an LLM to reason about moving things in 3 dimensions. The prompt is a puzzle in the way that a Rubik's Cube is a puzzle. A 7 year old child can learn 6 moves and figure out how to solve a Rubik's Cube in a weekend, the LLM can't solve it. However, can, given the correct prompt, a LLM solve it? The prompt is the puzzle. That is why it is fun and interesting. Plus, it is a spatial problem so if we solve that we solve a massive class of problems including huge swathes of mathematics the LLMs can't touch yet.

                                                                                                                                                                                                                                                                                                                                                        • libraryofbabel

                                                                                                                                                                                                                                                                                                                                                          today at 4:17 AM

                                                                                                                                                                                                                                                                                                                                                          I wonder if the difficulties LLMs have with ā€œseeingā€ complex detail in images is muddying the problem here. What if you hand it the cube state in text form? (You could try ascii art if you want a middle ground.)

                                                                                                                                                                                                                                                                                                                                                          If you want to isolate the issue, try getting the LLM itself to turn the images into a text representation of the cube state and check for accuracy. If it can’t see state correctly it certainly won’t be able to solve.

                                                                                                                                                                                                                                                                                                                                                          • osti

                                                                                                                                                                                                                                                                                                                                                            yesterday at 10:09 PM

                                                                                                                                                                                                                                                                                                                                                            Can't they write a script to solve rubik cubes?

                                                                                                                                                                                                                                                                                                                                                              • Jensson

                                                                                                                                                                                                                                                                                                                                                                today at 3:44 AM

                                                                                                                                                                                                                                                                                                                                                                That doesn't test whether the model can follow and execute a dynamic plan reliably.

                                                                                                                                                                                                                                                                                                                                                        • yesterday at 9:43 PM

                                                                                                                                                                                                                                                                                                                                                      • Torkel

                                                                                                                                                                                                                                                                                                                                                        yesterday at 9:37 PM

                                                                                                                                                                                                                                                                                                                                                        *yet

                                                                                                                                                                                                                                                                                                                                                    • vunderba

                                                                                                                                                                                                                                                                                                                                                      yesterday at 6:28 PM

                                                                                                                                                                                                                                                                                                                                                      I’ve had a lot of success using LLMs to help with my Three.js based games and projects. Many of my weird clock visualizations relied heavily on it.

                                                                                                                                                                                                                                                                                                                                                      It might not be a game engine, but it’s the de facto standard for doing WebGL 3D. And since it’s been around forever, there’s a massive amount of training data available for it.

                                                                                                                                                                                                                                                                                                                                                      Before LLMs were a thing, I relied more on Babylon.js, since it’s a bit higher level and gives you more batteries included for game development.

                                                                                                                                                                                                                                                                                                                                                      • peder

                                                                                                                                                                                                                                                                                                                                                        yesterday at 11:50 PM

                                                                                                                                                                                                                                                                                                                                                        > It really seems like we could be at the dawn of a new era similiar to flash

                                                                                                                                                                                                                                                                                                                                                        We've been there for a while.... creativity has been the primary bottleneck

                                                                                                                                                                                                                                                                                                                                                        • kingstnap

                                                                                                                                                                                                                                                                                                                                                          yesterday at 6:31 PM

                                                                                                                                                                                                                                                                                                                                                          The meshes look interesting, but the gameplay is very basic. The tank one seems more sophisticated with the flying ships and whatnot.

                                                                                                                                                                                                                                                                                                                                                          What's strange is that this Pietro Schirano dude seems to write incredibly cargo cult prompts.

                                                                                                                                                                                                                                                                                                                                                            Game created by Pietro Schirano, CEO of MagicPath
                                                                                                                                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                                                            Prompt: Create a 3D game using three.js. It should be a UFO shooter where I control a tank and shoot down UFOs flying overhead.
                                                                                                                                                                                                                                                                                                                                                            - Think step by step, take a deep breath. Repeat the question back before answering.
                                                                                                                                                                                                                                                                                                                                                            - Imagine you're writing an instruction message for a junior developer who's going to go build this. Can you write something extremely clear and specific for them, including which files they should look at for the change and which ones need to be fixed?
                                                                                                                                                                                                                                                                                                                                                            -Then write all the code. Make the game low-poly but beautiful.
                                                                                                                                                                                                                                                                                                                                                            - Remember, you are an agent: please keep going until the user's query is completely resolved before ending your turn and yielding back to the user. Decompose the user's query into all required sub-requests and confirm that each one is completed. Do not stop after completing only part of the request. Only terminate your turn when you are sure the problem is solved. You must be prepared to answer multiple queries and only finish the call once the user has confirmed they're done.
                                                                                                                                                                                                                                                                                                                                                            - You must plan extensively in accordance with the workflow steps before making subsequent function calls, and reflect extensively on the outcomes of each function call, ensuring the user's query and related sub-requests are completely resolved.

                                                                                                                                                                                                                                                                                                                                                            • torginus

                                                                                                                                                                                                                                                                                                                                                              yesterday at 7:57 PM

                                                                                                                                                                                                                                                                                                                                                              It's weird how people pep talk the AI - if my Jira tickets looked like this, I would throw a fit.

                                                                                                                                                                                                                                                                                                                                                              I guess these people think they have special prompt engineering skills, and doing it like this is better than giving the AI a dry list of requirements (fwiw, they might be even right)

                                                                                                                                                                                                                                                                                                                                                                • mattgreenrocks

                                                                                                                                                                                                                                                                                                                                                                  yesterday at 8:15 PM

                                                                                                                                                                                                                                                                                                                                                                  It’s not surprising to me that the same crowd that cheers for the demise of software engineering skills invented its own notion of AI prompting skills.

                                                                                                                                                                                                                                                                                                                                                                  Too bad they can veer sharply into cringe territory pretty fast: ā€œas an accomplished Senior Principal Engineer at a FAANG with 22 years of experience, create a todo list app.ā€ It’s like interactive fanfiction.

                                                                                                                                                                                                                                                                                                                                                                    • dr_kiszonka

                                                                                                                                                                                                                                                                                                                                                                      today at 1:46 AM

                                                                                                                                                                                                                                                                                                                                                                      That's quite similar to the AI Studio's prompt. You are a world-class frontend engineer...

                                                                                                                                                                                                                                                                                                                                                                      • eiksjs

                                                                                                                                                                                                                                                                                                                                                                        yesterday at 8:33 PM

                                                                                                                                                                                                                                                                                                                                                                        Indeed it is so utterly cringe.

                                                                                                                                                                                                                                                                                                                                                                    • eloisant

                                                                                                                                                                                                                                                                                                                                                                      yesterday at 10:04 PM

                                                                                                                                                                                                                                                                                                                                                                      Yes, this is cargo cult.

                                                                                                                                                                                                                                                                                                                                                                      This remind me of so called "optimization" hacks that people keep applying years after their languages get improved to make them unnecessary or even harmful.

                                                                                                                                                                                                                                                                                                                                                                      Maybe at one point it helped to write prompts in this weird way, but with all the progress going on both in the models and the harness if it's not obsolete yet it will soon be. Just crufts that consumes tokens and fills the context window for nothing.

                                                                                                                                                                                                                                                                                                                                                                  • irthomasthomas

                                                                                                                                                                                                                                                                                                                                                                    yesterday at 6:57 PM

                                                                                                                                                                                                                                                                                                                                                                    > Think Step By Step

                                                                                                                                                                                                                                                                                                                                                                    What is this, 2023?

                                                                                                                                                                                                                                                                                                                                                                    I feel like this was generated by a model tapping in to 2023 notions of prompt engineering.

                                                                                                                                                                                                                                                                                                                                                                      • retr0rocket

                                                                                                                                                                                                                                                                                                                                                                        yesterday at 7:22 PM

                                                                                                                                                                                                                                                                                                                                                                        [dead]

                                                                                                                                                                                                                                                                                                                                                                    • skirano

                                                                                                                                                                                                                                                                                                                                                                      yesterday at 7:53 PM

                                                                                                                                                                                                                                                                                                                                                                      Pietro here, I just published a video of it: https://x.com/skirano/status/2047403025094905964?s=20

                                                                                                                                                                                                                                                                                                                                                                      • tantalor

                                                                                                                                                                                                                                                                                                                                                                        yesterday at 6:57 PM

                                                                                                                                                                                                                                                                                                                                                                        It comes across as an elaborate, sparkly motivational cat poster.

                                                                                                                                                                                                                                                                                                                                                                        *BELIEVE!* https://www.youtube.com/watch?v=D2CRtES2K3E

                                                                                                                                                                                                                                                                                                                                                                      • bredren

                                                                                                                                                                                                                                                                                                                                                                        yesterday at 7:24 PM

                                                                                                                                                                                                                                                                                                                                                                        The prompt did not specify advanced gameplay.

                                                                                                                                                                                                                                                                                                                                                                        I do not see instructions to assist in task decomposition and agent ~"motivation" to stay aligned over long periods as cargo culting.

                                                                                                                                                                                                                                                                                                                                                                        See up thread for anecdotes [1].

                                                                                                                                                                                                                                                                                                                                                                        > Decompose the user's query into all required sub-requests and confirm that each one is completed. Do not stop after completing only part of the request. Only terminate your turn when you are sure the problem is solved.

                                                                                                                                                                                                                                                                                                                                                                        I see this as a portrayal of the strength of 5.5, since it suggests the ability to be assigned this clearly important role to ~one shot requests like this.

                                                                                                                                                                                                                                                                                                                                                                        I've been using a cli-ai-first task tool I wrote to process complex "parent" or "umberella" into decomposed subtasks and then execute on them.

                                                                                                                                                                                                                                                                                                                                                                        This has allowed my workflows to float above the ups and downs of model performance.

                                                                                                                                                                                                                                                                                                                                                                        That said, having the AI do the planning for a big request like this internally is not good outside a demo.

                                                                                                                                                                                                                                                                                                                                                                        Because, you want the planning of the AI to be part of the historical context and available for forensics due to stalls, unwound details or other unexpected issues at any point along the way.

                                                                                                                                                                                                                                                                                                                                                                        [1] https://news.ycombinator.com/item?id=47879819

                                                                                                                                                                                                                                                                                                                                                                        • ahoka

                                                                                                                                                                                                                                                                                                                                                                          yesterday at 7:57 PM

                                                                                                                                                                                                                                                                                                                                                                          "take a deep breath"

                                                                                                                                                                                                                                                                                                                                                                          OMFG

                                                                                                                                                                                                                                                                                                                                                                            • jameshart

                                                                                                                                                                                                                                                                                                                                                                              today at 4:33 AM

                                                                                                                                                                                                                                                                                                                                                                              Claude would check to see if it had any breathing skills, if it doesn't find any it would start installing npm modules for breathing.

                                                                                                                                                                                                                                                                                                                                                                      • mindhunter

                                                                                                                                                                                                                                                                                                                                                                        yesterday at 9:32 PM

                                                                                                                                                                                                                                                                                                                                                                        A friend is building Jamboree[1] (prev name "Spielwerk") for iOS. An app to build and share games. They're all web based so they're easy to share.

                                                                                                                                                                                                                                                                                                                                                                        [1] https://apps.apple.com/uz/app/jamboree-game-maker/id67473110...

                                                                                                                                                                                                                                                                                                                                                                        • yesterday at 7:05 PM

                                                                                                                                                                                                                                                                                                                                                                          • yesterday at 7:15 PM

                                                                                                                                                                                                                                                                                                                                                                            • nemo44x

                                                                                                                                                                                                                                                                                                                                                                              yesterday at 10:15 PM

                                                                                                                                                                                                                                                                                                                                                                              It’s like all these things though - it’s not a real production worthy product. It’s a super-demo. It looks amazing until you realize there’s many months of work to make it something of quality and value.

                                                                                                                                                                                                                                                                                                                                                                              I think people are starting to catch on to where we really are right now. Future models will be better but we are entering a trough of dissolution and this attitude will be widespread in a few months.

                                                                                                                                                                                                                                                                                                                                                                              • ZeWaka

                                                                                                                                                                                                                                                                                                                                                                                yesterday at 6:24 PM

                                                                                                                                                                                                                                                                                                                                                                                I personally don't think the gameplay itself is that impressive.

                                                                                                                                                                                                                                                                                                                                                                                • gregpred

                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 6:12 PM

                                                                                                                                                                                                                                                                                                                                                                                  [flagged]

                                                                                                                                                                                                                                                                                                                                                                              • 6thbit

                                                                                                                                                                                                                                                                                                                                                                                yesterday at 7:19 PM

                                                                                                                                                                                                                                                                                                                                                                                                          Mythos     5.5
                                                                                                                                                                                                                                                                                                                                                                                    SWE-bench Pro          77.8%*   58.6%
                                                                                                                                                                                                                                                                                                                                                                                    Terminal-bench-2.0     82.0%    82.7%*
                                                                                                                                                                                                                                                                                                                                                                                    GPQA Diamond           94.6%*   93.6%
                                                                                                                                                                                                                                                                                                                                                                                    H. Last Exam           56.8%*   41.4%
                                                                                                                                                                                                                                                                                                                                                                                    H. Last Exam (tools)   64.7%*   52.2%    
                                                                                                                                                                                                                                                                                                                                                                                    BrowseComp             86.9%    84.4%  (90.1% Pro)*
                                                                                                                                                                                                                                                                                                                                                                                    OSWorld-Verified       79.6%*   78.7%
                                                                                                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                                                                                                Still far from Mythos on SWE-bench but quite comparable otherwise. Source for mythos values: https://www.anthropic.com/glasswing

                                                                                                                                                                                                                                                                                                                                                                                  • aliljet

                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 7:47 PM

                                                                                                                                                                                                                                                                                                                                                                                    Mythos is only real when it's actually available. If you're using Opus 4.7 right now, you know how incredibly nerfed the Opus autonomy is in service of perceived safety. I'm not so confident this will be as great as Anthropic wants us to believe..

                                                                                                                                                                                                                                                                                                                                                                                    • XCSme

                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 7:51 PM

                                                                                                                                                                                                                                                                                                                                                                                      They mentioned in their release page, that the Claude team noticed memorization of the SWE-bench test, so the test is actually in the training data.

                                                                                                                                                                                                                                                                                                                                                                                      Here: https://www.anthropic.com/news/claude-opus-4-7#:~:text=memor...

                                                                                                                                                                                                                                                                                                                                                                                        • William_BB

                                                                                                                                                                                                                                                                                                                                                                                          today at 6:44 AM

                                                                                                                                                                                                                                                                                                                                                                                          Good luck arguing with SWE benchmark purists

                                                                                                                                                                                                                                                                                                                                                                                      • kaonashi-tyc-01

                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 8:49 PM

                                                                                                                                                                                                                                                                                                                                                                                        I did some study on Verified, not Pro, but Mythos number there rings a lot of questions on my end.

                                                                                                                                                                                                                                                                                                                                                                                        If you look at the SWEBench official submissions: https://github.com/SWE-bench/experiments/tree/main/evaluatio..., filter all models after Sonnet 4, and aggregate ALL models' submission across 500 problems, what I found that the aggregated resolution rate is 93% (sharp).

                                                                                                                                                                                                                                                                                                                                                                                        Mythos gets 93.7%, meaning it solves problems that no other models could ever solve. I took a look at those problems, then I became even more suspicious, for the remaining 7% problems, it is almost impossible to resolve those issues without looking at the testing patch ahead of time, because how drastically the solution itself deviates from the problem statement, it almost feels like it is trying to solve a different problem.

                                                                                                                                                                                                                                                                                                                                                                                        Not that I am saying Mythos is cheating, but it might be too capable to remember all states of said repos, that it is able to reverse engineer the TRUE problem statement by diffing within its own internal memory. I think it could be a unique phenomena of evaluation awareness. Otherwise I genuinely couldn't think of exactly how it could be this precise in deciphering such unspecific problem statements.

                                                                                                                                                                                                                                                                                                                                                                                          • yfontana

                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 9:09 PM

                                                                                                                                                                                                                                                                                                                                                                                            OpenAI wrote a couple months ago that they do not consider SWE Bench Verified a meaningful benchmark anymore (and they were the ones who published it in the first place): https://openai.com/index/why-we-no-longer-evaluate-swe-bench...

                                                                                                                                                                                                                                                                                                                                                                                              • kaonashi-tyc-01

                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 9:14 PM

                                                                                                                                                                                                                                                                                                                                                                                                Yep, I read this blog. What confuses me is that Anthropic doesn't seem to be bothered by this study and keeps publishing Verified results.

                                                                                                                                                                                                                                                                                                                                                                                                That is what gets me curious in the first place. The fact Mythos scored so high, IMO, exposes some issues with this model: it is able to solve seemingly impossible to solve problems.

                                                                                                                                                                                                                                                                                                                                                                                                Without cheating allegation, which I don't think ANT is doing, it has to be doing some fortune telling/future reading to score that high at all.

                                                                                                                                                                                                                                                                                                                                                                                        • alansaber

                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 8:27 PM

                                                                                                                                                                                                                                                                                                                                                                                          A single benchmark is meaningless, you always get quirky results on some benchmarks.

                                                                                                                                                                                                                                                                                                                                                                                      • silvertaza

                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 7:57 PM

                                                                                                                                                                                                                                                                                                                                                                                        Still huge hallucination rate, unfortunately at 86%. To compare, Opus sits at 36%.

                                                                                                                                                                                                                                                                                                                                                                                        Source: https://artificialanalysis.ai/models?omniscience=omniscience...

                                                                                                                                                                                                                                                                                                                                                                                          • dubcanada

                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 8:42 PM

                                                                                                                                                                                                                                                                                                                                                                                            grok is 17%? And that's the lowest, most models are like 80%+?

                                                                                                                                                                                                                                                                                                                                                                                            While hallucination is probably closer to 100% depending on the question. This benchmark makes no sense.

                                                                                                                                                                                                                                                                                                                                                                                              • Jensson

                                                                                                                                                                                                                                                                                                                                                                                                today at 3:47 AM

                                                                                                                                                                                                                                                                                                                                                                                                > While hallucination is probably closer to 100% depending on the question.

                                                                                                                                                                                                                                                                                                                                                                                                But the benchmark didn't ask those questions, and it seems grok is very well at saying it doesn't know the answer otherwise.

                                                                                                                                                                                                                                                                                                                                                                                                • elAhmo

                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 9:16 PM

                                                                                                                                                                                                                                                                                                                                                                                                  No one serious uses grok.

                                                                                                                                                                                                                                                                                                                                                                                                    • ajdegol

                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 9:38 PM

                                                                                                                                                                                                                                                                                                                                                                                                      @grok is this true?

                                                                                                                                                                                                                                                                                                                                                                                                        • NamlchakKhandro

                                                                                                                                                                                                                                                                                                                                                                                                          today at 5:15 AM

                                                                                                                                                                                                                                                                                                                                                                                                          no

                                                                                                                                                                                                                                                                                                                                                                                                      • RALaBarge

                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 10:50 PM

                                                                                                                                                                                                                                                                                                                                                                                                        YMMV but Grok 4.1 Fast can usually find via static analysis a few things that other models dont seem to catch with the same prompt

                                                                                                                                                                                                                                                                                                                                                                                                        • d0gsg0w00f

                                                                                                                                                                                                                                                                                                                                                                                                          today at 3:06 AM

                                                                                                                                                                                                                                                                                                                                                                                                          Why not? Honest question.

                                                                                                                                                                                                                                                                                                                                                                                                      • MagicMoonlight

                                                                                                                                                                                                                                                                                                                                                                                                        today at 10:40 AM

                                                                                                                                                                                                                                                                                                                                                                                                        It makes sense. Grok is taught to answer the question, regardless of how explicit or extreme it is. These other models are taught to suppress any wrongthink. That's going to make it hard to answer things correctly. If you've been told to answer something incorrectly because it's wrong, then you'll have to make up an answer.

                                                                                                                                                                                                                                                                                                                                                                                                    • simianwords

                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 8:23 PM

                                                                                                                                                                                                                                                                                                                                                                                                      There's something off with this because Haiku should not be that good.

                                                                                                                                                                                                                                                                                                                                                                                                        • camgunz

                                                                                                                                                                                                                                                                                                                                                                                                          today at 6:55 AM

                                                                                                                                                                                                                                                                                                                                                                                                          Hallucination benchmarks accept "I don't know", which Haiku did at least a little. Here are other benchmarks corroborating: https://suprmind.ai/hub/ai-hallucination-rates-and-benchmark...

                                                                                                                                                                                                                                                                                                                                                                                                          • rattray

                                                                                                                                                                                                                                                                                                                                                                                                            today at 1:36 AM

                                                                                                                                                                                                                                                                                                                                                                                                            I've been very curious about that too. I wonder if it's actually much better at admitting when it doesn't know something, because it thinks it's a "dumber model". But I haven't played with this at all myself.

                                                                                                                                                                                                                                                                                                                                                                                                            • jwpapi

                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 8:37 PM

                                                                                                                                                                                                                                                                                                                                                                                                              The hallucination benchmark is hallucinating

                                                                                                                                                                                                                                                                                                                                                                                                          • dakolli

                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 8:30 PM

                                                                                                                                                                                                                                                                                                                                                                                                            This indicates they want this behavior, they know the person asking the question probably doesn't understand the problem entirely (or why would they be asking), so they'd prefer a confident response, regardless of outcomes, because the point is to sell the technologies competency (and the perception thereof), not the capabilities, to a bunch of people that have no clue what they're talking about.

                                                                                                                                                                                                                                                                                                                                                                                                            LLMs will ruin your product, have fun trusting a billionaires thinking machine they swear is capable of replacing your employees if you just pay them 75% of your labor budget.

                                                                                                                                                                                                                                                                                                                                                                                                              • tedsanders

                                                                                                                                                                                                                                                                                                                                                                                                                today at 1:24 AM

                                                                                                                                                                                                                                                                                                                                                                                                                We don't want hallucinations either, I promise you.

                                                                                                                                                                                                                                                                                                                                                                                                                A few biased defenses:

                                                                                                                                                                                                                                                                                                                                                                                                                - I'll note that this eval doesn't have web search enabled, but we train our models to use web search in ChatGPT, Codex, and our API. I'd be curious to see hallucination rates with web search on.

                                                                                                                                                                                                                                                                                                                                                                                                                - This eval only measures binary attempted vs did not attempt, but doesn't really reward any sort of continuous hedging like "I think it's X, but to be honest I'm not sure."

                                                                                                                                                                                                                                                                                                                                                                                                                - On the flip side, GPT-5.5 has the highest accuracy score.

                                                                                                                                                                                                                                                                                                                                                                                                                - With any rate over 1% (whether 30% or 70%), you should be verifying anything important anyway.

                                                                                                                                                                                                                                                                                                                                                                                                                - On our internal eval made from de-identified ChatGPT prompts that previously elicited hallucinations, we've actually been improving substantially from 5.2 to 5.4 to 5.5. So as always, progress depends on how you measure it.

                                                                                                                                                                                                                                                                                                                                                                                                                - Models that ask more clarifying questions will do better on this eval, even if they are just as likely to hallucinate after the clarifying question.

                                                                                                                                                                                                                                                                                                                                                                                                                Still, Anthropic has done a great job here and I hope we catch up to them on this eval in the future.

                                                                                                                                                                                                                                                                                                                                                                                                                • calf

                                                                                                                                                                                                                                                                                                                                                                                                                  today at 1:26 AM

                                                                                                                                                                                                                                                                                                                                                                                                                  On ChatGPT 5.3 Plus subscription I find that long informal chats tend to reveal unsatisfactory answers and biases, at this point after 10 rounds of replies I end up having to correct it so much that it starts to agree with my initial arguments full circle. I don't see how this behavior is acceptable or safe for real work. Like are programmers and engineers using LLMs completely differently than I'm doing, because the underlying technology is fundamentally the same.

                                                                                                                                                                                                                                                                                                                                                                                                                    • William_BB

                                                                                                                                                                                                                                                                                                                                                                                                                      today at 6:50 AM

                                                                                                                                                                                                                                                                                                                                                                                                                      Totally agreed, this has been and will continue to be a problem for all existing models.

                                                                                                                                                                                                                                                                                                                                                                                                                      > Like are programmers and engineers using LLMs completely differently than I'm doing

                                                                                                                                                                                                                                                                                                                                                                                                                      No, but the complexity of the problem matters. Lots of engineers doing basic CRUD and prototyping overestimate the capabilities of LLMs.

                                                                                                                                                                                                                                                                                                                                                                                                          • mudkipdev

                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 7:11 PM

                                                                                                                                                                                                                                                                                                                                                                                                            This is 3x the price of GPT-5.1, released just 6 months ago. Is no one else alarmed by the trend? What happens when the cheaper models are deprecated/removed over time?

                                                                                                                                                                                                                                                                                                                                                                                                              • Night_Thastus

                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 8:08 PM

                                                                                                                                                                                                                                                                                                                                                                                                                This is entirely expected. The low prices of using LLMs early on was totally and completely unsustainable. The companies providing such services were (and still are) burning money by the truckload.

                                                                                                                                                                                                                                                                                                                                                                                                                The hope is to get a big userbase who eventually become dependent on it for their workflow, then crank up the price until it finally becomes profitable.

                                                                                                                                                                                                                                                                                                                                                                                                                The price for all models by all companies will continue to go up, and quickly.

                                                                                                                                                                                                                                                                                                                                                                                                                  • oezi

                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 10:13 PM

                                                                                                                                                                                                                                                                                                                                                                                                                    I recently looked at this a bit but came away with the impression that at least on API pricing the models should be very profitable considering primarily the electricity cost.

                                                                                                                                                                                                                                                                                                                                                                                                                    Subscriptions and free plans are the thing that can easily burn money.

                                                                                                                                                                                                                                                                                                                                                                                                                      • Night_Thastus

                                                                                                                                                                                                                                                                                                                                                                                                                        today at 12:03 AM

                                                                                                                                                                                                                                                                                                                                                                                                                        The physical buildouts and massive R+D spending is the big part.

                                                                                                                                                                                                                                                                                                                                                                                                                    • viktorcode

                                                                                                                                                                                                                                                                                                                                                                                                                      today at 10:44 AM

                                                                                                                                                                                                                                                                                                                                                                                                                      > This is entirely expected. The low prices of using LLMs early on was totally and completely unsustainable.

                                                                                                                                                                                                                                                                                                                                                                                                                      Do you think this is true for DeepSeek as well?

                                                                                                                                                                                                                                                                                                                                                                                                                      • subhobroto

                                                                                                                                                                                                                                                                                                                                                                                                                        today at 2:31 AM

                                                                                                                                                                                                                                                                                                                                                                                                                        > The price for all models by all companies will continue to go up, and quickly.

                                                                                                                                                                                                                                                                                                                                                                                                                        This might entirely be true but I'm hoping that's because the frontier models are just actually more expensive to run as well.

                                                                                                                                                                                                                                                                                                                                                                                                                        Said another way, I would hope, the price of GPT-5.5 falls significantly in a year when GPT-5.8 is out.

                                                                                                                                                                                                                                                                                                                                                                                                                        Someone else on this post commented:

                                                                                                                                                                                                                                                                                                                                                                                                                        > For API usage, GPT-5.5 is 2x the price of GPT-5.4, ~4x the price of GPT-5.1, and ~10x the price of Kimi-2.6.

                                                                                                                                                                                                                                                                                                                                                                                                                        Having used Kimi-2.6, it can go on for hours spewing nonsense. I personally am happy to pay 10x the price of something that doesn't help me, for something else that does, in even half the time.

                                                                                                                                                                                                                                                                                                                                                                                                                    • energy123

                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 7:22 PM

                                                                                                                                                                                                                                                                                                                                                                                                                      Look a cost per intelligence or cost per task instead of cost per token.

                                                                                                                                                                                                                                                                                                                                                                                                                        • yokoprime

                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 7:48 PM

                                                                                                                                                                                                                                                                                                                                                                                                                          How do I reliably measure 1 unit of intelligence?

                                                                                                                                                                                                                                                                                                                                                                                                                            • wellthisisgreat

                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 8:48 PM

                                                                                                                                                                                                                                                                                                                                                                                                                              In pelicans, obviously

                                                                                                                                                                                                                                                                                                                                                                                                                          • ulimn

                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 7:36 PM

                                                                                                                                                                                                                                                                                                                                                                                                                            Isn't the outcome / solution for a given task non-deterministic? So can we reliably measure that?

                                                                                                                                                                                                                                                                                                                                                                                                                              • foota

                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 7:51 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                Yes, sort of. Generally you can measure the pass rate on a benchmark given a fixed compute budget. A sufficiently smart model can hit a high pass rate with fewer tokens/compute. Check out the cost efficiency on https://artificialanalysis.ai/ (say this posted here the other day, pretty neat charts!)

                                                                                                                                                                                                                                                                                                                                                                                                                                • genericresponse

                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 7:48 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                  Statistically. Do many trials and measure how often it succeeds/fails.

                                                                                                                                                                                                                                                                                                                                                                                                                                  • torginus

                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 8:02 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                    This is the only correct take. The only metric that matters is cost per desired outcome.

                                                                                                                                                                                                                                                                                                                                                                                                                                    • dns_snek

                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 7:47 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                      Repetition and statistics, if you have $1000++ you didn't need anyway.

                                                                                                                                                                                                                                                                                                                                                                                                                                      • throwuxiytayq

                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 7:58 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                        It's much easier to measure a language model's intelligence than a human's because you can take as many samples as you want without affecting its knowledge. And we do measure human intelligence.

                                                                                                                                                                                                                                                                                                                                                                                                                                • Schlagbohrer

                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 10:03 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                  As others have mentioned you're ignoring the long tail of open-weights models which can be self hosted. As long as that quasi-open-source competition keeps up the pace, it will put a cap on how expensive the frontier models can get before people have to switch to self-hosting.

                                                                                                                                                                                                                                                                                                                                                                                                                                  That's a big if, though. I wish Meta were still releasing top of the line, expensively produced open-weights models. Or if Anthropic, Google, or X would release an open mini version.

                                                                                                                                                                                                                                                                                                                                                                                                                                    • Wowfunhappy

                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 10:33 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                      Well, Google does release mini open versions of their models. https://deepmind.google/models/gemma/gemma-4/

                                                                                                                                                                                                                                                                                                                                                                                                                                        • deaux

                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 10:39 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                          And they're incredibly good for their size.

                                                                                                                                                                                                                                                                                                                                                                                                                                            • boppo1

                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 2:50 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                              Which, unfortunately is still slow unusable garbage compared to fronteir models.

                                                                                                                                                                                                                                                                                                                                                                                                                                                • deaux

                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 4:49 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                  Not at all, it's more than enough for a large range of tasks. As for slow, that's just a function of how much compute you throw at it, which you actually control unlike with closed weights models.

                                                                                                                                                                                                                                                                                                                                                                                                                                  • dannyw

                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 7:44 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                    It's far more meaningful to look at the actual cost to successfully something. The token efficiency of GPT-5.5 is real; as well as it just being far better for work.

                                                                                                                                                                                                                                                                                                                                                                                                                                    • operatingthetan

                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 7:19 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                      We know they cost much more than this for OpenAI. Assume prices will continue to climb until they are making money.

                                                                                                                                                                                                                                                                                                                                                                                                                                        • horiap

                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 10:54 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                          How do we know that? There is a large gap between API pricing for SOTA models and similarly sized OSS models hosted by 3rd party providers.

                                                                                                                                                                                                                                                                                                                                                                                                                                          Sure, they’re distilled and should be cheaper to run but at the same time, these hosting providers do turn a margin on these given it’s their core business, unless they do it out of the kindness of their heart.

                                                                                                                                                                                                                                                                                                                                                                                                                                          So it’s hard for me to imagine these providers are losing money on API pricing.

                                                                                                                                                                                                                                                                                                                                                                                                                                          • beering

                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 9:50 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                            source? There have also been a bunch of people here saying the opposite

                                                                                                                                                                                                                                                                                                                                                                                                                                        • dandaka

                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 7:16 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                          SOTA models get distilled to open source weights in ~6 months. So paying premium for bleeding edge performance sounds like a fair compensation for enormous capex.

                                                                                                                                                                                                                                                                                                                                                                                                                                          • typs

                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 10:38 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                            GPT-4 cost 6x on input and 2x output tokens when it was released as compared go GPT-5.5

                                                                                                                                                                                                                                                                                                                                                                                                                                            • kuatroka

                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 9:17 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                              Not really a big problem. Switch to KIMI, Qwen, GLM. You’ll get 95% quality of GPT or Anthropic for a 10th of a price. I feel like the real dependency is more mental, more of a habit but if you actually dip your toes outside OpenAI, Anthropic, Gemini from time to time, you realise that the actual difference in code is not huge if prompted in a good way. Maybe you’ll have to tell it to do something twice and it won’t be a one shot, but it’s really not an issue at all.

                                                                                                                                                                                                                                                                                                                                                                                                                                                • Mashimo

                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 4:54 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                  I use glm and I like it, not they also increased the price to 18 usd /month.

                                                                                                                                                                                                                                                                                                                                                                                                                                                  I think Kimi and qwen are similar?

                                                                                                                                                                                                                                                                                                                                                                                                                                                    • ramon156

                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 6:27 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                      Wasn't there a limited offer? I wouldn't call that "increasing the prices"

                                                                                                                                                                                                                                                                                                                                                                                                                                                        • Mashimo

                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 7:55 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                          I think both? Limited offer stopped and price increase. But the point was that is now a similar price to ChatGPT and Claude code, and not 10th of the price.

                                                                                                                                                                                                                                                                                                                                                                                                                                                  • nubg

                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 10:58 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                    God I hope this is true.

                                                                                                                                                                                                                                                                                                                                                                                                                                                    Where can i find up to date resources on open source models for coding?

                                                                                                                                                                                                                                                                                                                                                                                                                                                      • vibe42

                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 2:16 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                        https://old.reddit.com/r/LocalLLaMA/

                                                                                                                                                                                                                                                                                                                                                                                                                                                        Bit of a hype madhouse whenever a new model is released, but it's pretty easy to filter out simple hype from people showing reproducible experiments, specific configs for llama.cpp, github links etc.

                                                                                                                                                                                                                                                                                                                                                                                                                                                • thrawa8387336

                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 2:53 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                  Apparently the cost/price is 20x in the major providers. Not clear how it is a business

                                                                                                                                                                                                                                                                                                                                                                                                                                                  • msdz

                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 7:23 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                    Such an increase tracks the company's valuation trend, which they constantly, somehow have to justify (let alone break even on costs).

                                                                                                                                                                                                                                                                                                                                                                                                                                                • applfanboysbgon

                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 6:07 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                  If there's a bingo card for model releases, "our [superlative] and [superlative] model yet" is surely the free space.

                                                                                                                                                                                                                                                                                                                                                                                                                                                    • tom1337

                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 6:19 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                      Do "our [superlative] and [superlative] [product] yet" and you have pretty much every product launch

                                                                                                                                                                                                                                                                                                                                                                                                                                                    • xnx

                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 6:19 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                      "our newest and most expensive model yet"

                                                                                                                                                                                                                                                                                                                                                                                                                                                        • yesterday at 7:25 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                      • wiseowise

                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 8:49 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                        "Best iPhone ever"

                                                                                                                                                                                                                                                                                                                                                                                                                                                        • ertgbnm

                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 7:22 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                          can't wait for "our worst and dumbest model yet"

                                                                                                                                                                                                                                                                                                                                                                                                                                                            • Nition

                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 7:35 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                              Apple should have used that one for the 2016 MacBook.

                                                                                                                                                                                                                                                                                                                                                                                                                                                      • kordlessagain

                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 12:25 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                        If anyone wants Codex CLI containers with various MCP tools available, I built this: https://deepbluedynamics.com/nemesis

                                                                                                                                                                                                                                                                                                                                                                                                                                                        • vthallam

                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 7:03 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                          This model is great at long horizon tasks, and Codex now has heartbeats, so it can keep checking on things. Give it your hardest problem that would take hours with verifiable constraints, you will see how good this is:)

                                                                                                                                                                                                                                                                                                                                                                                                                                                          *I work at OAI.

                                                                                                                                                                                                                                                                                                                                                                                                                                                            • spaceman_2020

                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 9:08 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                              Is there any task that actually doesn't require human intervention in-between, even if its just to setup stuff?

                                                                                                                                                                                                                                                                                                                                                                                                                                                              Like I will get Opus to make me an app but it will stop in between because I need to setup the db and plug in the API keys and Opus really can't do that on its own yet

                                                                                                                                                                                                                                                                                                                                                                                                                                                              • thereeldeel

                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 8:26 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                Will Codex App support new context window, rather than compaction, for "unrelated" sub-tasks during long horizon tasks?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                • dandaka

                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 7:23 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Could be a great feature, can't wait to test! Tired of other models (looking at you Opus) constantly stuck mid-task lately.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • winrid

                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 8:05 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Interesting, I just had opus convert a 35k loc java game to c++ overnight (root agent that orchestrated and delegated to sub agents) and woke up and it's done and works.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                      What plan are you on? I'm starting to wonder if they're dynamically adjusting reasoning based on plan or something.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • gck1

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 9:10 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          I'm on max 5x and noticed this too. I don't use built-in subagents but rather full Claude session that orchestrates other full claude sessions. Worker agents that receive tasks now stop midway, they ask for permission to continue. My "heartbeat" is basically "status. One line" message sent to the orchestrator.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Opus 4.6 worker agents never asked for permission to continue, and when heartbeat was sent to orchestrator, it just knew what to do (checked on subagents etc). Now it just says that it waits for me to confirm something.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • winrid

                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 6:41 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Weird. I don't have this behavior, although I did with codex and 5.4 haha. I bet the providers are playing with settings underneath and different users are routed to different deployments, or they're secretly routing us to different models under load.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • adamandsteve

                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 11:22 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                            This has to be bait.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • azan_

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 12:12 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Why?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • winrid

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 6:39 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  what?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • adamandsteve2

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 12:13 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Because there’s no way in hell it can rewrite a game with 35k loc perfectly lol, link the codebase or it didn’t happen.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • frotaur

                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 8:59 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                            I've been using the /ralph-loop plugin for claude code, works well to keep the model hammering at the task.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • dannyw

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 7:47 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          It's genuinely so great at long horizon tasks! GPT-5.5 solved many long-horizon frontier challenges, for the first time for an AI model we've tested, in our internal evals at Canva :) Congrats on the launch!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • brcmthrowaway

                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 9:40 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Can we not do growth hacking here?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • RALaBarge

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 10:52 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  We totally agree.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  That's what I've been heads down, HUNGRY, working on, looking for investors and founding engineers pst: https://heymanniceidea.com (disclaimer: I am not associated with heymanniceidea.com)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • smallerize

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 10:51 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    HN is owned by a startup accelerator and venture capital firm. They do growth hacking on the front page. And you probably know that since your throwaway account is several years old.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • yesterday at 10:51 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • bkyan

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 10:43 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Sorry, what is "heartbeats", exactly?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • aliljet

                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 7:12 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                            I've found myself so deeply embedded in the Claude Max subscription that I'm worried about potentially makign a switch. How are people making sure they stay nimble enough not to get trarpped by one company's ecosystem over another? For what it's worth, Opus 4.7 has not been a step up and it's come with an enormously higher usage of the subscription Anthropic offers making the entire offering double worse.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • gck1

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 9:17 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Start building your own liteweight "harness" that does things you need. Ignore all functionality of clients like CC or Codex and just implement whatever you start missing in your harness.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                You can replace pretty much everything - skills system, subagents, etc with just tmux and a simple cli tool that the official clients can call.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Oh and definitely disable any form of "memory" system.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Essentially, treat all tooling that wraps the models as dumb gateways to inference. Then provider switch is basically a one line config change.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • nunez

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 2:10 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    lol this is literally the same advice us ancient devops nerds were telling others back when ci/cd was new

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    write scripts that work anywhere and have your ci/cd pipeline be a "dumb" executor of those scripts. unless you want to be stuck on jenkins forever.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    what's old is new again!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • TacticalCoder

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 10:14 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      > You can replace pretty much everything - skills system, subagents, etc with just tmux and a simple cli tool that the official clients can call.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I'm very interest by this. Can you go a bit more into details?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      ATM for example I'm running Claude Code CLI in a VM on a server and I use SSH to access it. I don't depend on anything specific to Anthropic. But it's still a bit of a pain to "switch" to, say, Codex.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      How would that simple CLI tool work? And would CC / Codex call it?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • caspar

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 11:55 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Not the OP but here is a good example: https://mariozechner.at/posts/2025-11-30-pi-coding-agent/

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Initially I read it because just it was interesting but it has ended up being the harness I have stuck with - pi is well designed, nicely extensible and supports many model provider APIs. Though sadly gemini and claude's subscriptions can't really be used with it anymore thanks to openclaw.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • RALaBarge

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 10:56 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Check out github.com/ralabarge/beigebox -- OSS AI Harness, started as a way to save all of my data but has agentic features, MCP server, point it at any endpoint (or use any front end with it as well, transparent middleware)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            So far what I am finding is that you just get the basics working and then use the tool and inference to improve the tool.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • gck1

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 12:13 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I wish I had lower standards towards sharing absolute AI slop, then I could just drop a link to my implementation. But since I don't, let me just describe it. I essentially had claude build the initial version in a single session which I've been extending as I noticed any gaps in my process.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              First, you need an entrypoint that kicks things off. You never run `claude` or `codex`, you always start by running `mycli-entrypoint` that:

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              1. Creates tmux session 2. Creates pane 3. Spawns claude/codex/gemini - whichever your default configured backend is 4. Automatically delivers a prompt (essentially a 'system message') to that process via tmux paste telling it what `mycli` is, how to use it, what commands are available and how it should never use built-in tools that this cli provides as alternatives.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              After that, you build commands in `mycli` that CC/Codex are prompted to call when appropriate.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              For example, if you want a "subagent", you have a `mycli spawn` command that takes a role (just preconfigured markdown file living in the same project), backend (claude/codex/...) and a model. Then whenever CC wants to spawn a subagent, it will call that command instead, which will create a pane, spawn a process and return agent ID to CC. Agent ID is auto generated by your cli and tmux pane is renamed to that so you can easily match later.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Then you also need a way for these agents to talk to each other. So your cli also has a `send` command that takes agent ID and a message and delivers it to the appropriate pane using automatically tracked mapping of pane_id<>agent_id.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Claude and codex automatically store everything that happens in the process as jsonl files in their config dirs. Your cli should have adapters for each backend and parse them into common format.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              At this point, your possibilities are pretty much endless. You can have a sidecar process per agent that say, detects when model is reaching context window limit (it's in jsonl) and automatically send a message to it asking it to wrap up and report to a supervisor agent that will spawn a replacement.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I also don't use "skills" because skills are a loaded term that each of the harnesses interprets and loads/uses differently. So I call them "crafts" which are again, just markdown files in my project with an ID and supporting command `read-craft <craft-id>`. List of the available "crafts" are delivered using the same initialization message that each agent gets. If I like any third party skill, I just copy it to my "crafts" dir manually.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              My implementation is an absolute junk, just Python + markdown files, and I have never looked at the actual code, but it works and I can adapt it to my process very easily without being dependent on any third party tool.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • type4

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 7:16 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        I have a directory of skills that I symlink to Codex/Claude/pi. I make scripts that correspond with them to do any heavy lifting, I avoid platform specific features like Claude's hooks. I also symlink/share a user AGENTS.md/CLAUDE.md

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        MCPs aren't as smooth, but I just set them up in each environment.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • threecheese

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 8:02 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Anecdotally, I get the same wall time with my Max x5 (100$) and my ChatGPT Teams (30$) subscriptions.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • chis

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 7:40 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            It's surprisingly simple to switch. I mean both products offer basically identical coding CLI experiences. Personally I've been paying for Claude max $100, and ChatGPT $20, and then just using ChatGPT to fill in the gaps. Specifically I like it for code review and when Claude is down.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • dannyw

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 8:59 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Try GPT-5.5 as your daily driver for a bit. It felt a lot smarter, reliable, and I was much more productive with it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • zaptrem

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 3:50 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I bumped from $20 -> $100 today but the Codex CLI lacking code rewind and "you can change files but ask me every time" mode from Claude Code is quite annoying. Sometimes I want to code, not vibe code lol.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • hx8

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 2:36 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I use Open Code as my harness. It's open source, bring your own API Key or OAuth token or self-hosted model. I've jumped from Opus 4.6 to Opus 4.7 to GPT 5.5 in the last 7 days. No big deal, intelligence is just a commodity in 2026.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              The actual harness is great, very hackable, very extendable.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • zackify

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 2:54 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                I use pi.dev.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                I get openai team plan at work.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Claude enterprise too.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                I have openrouter for myself.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                I use minimax 2.7. Kimi 2.6. And gpt 5.5 and opus 4.7. I can toggle between them in an open source interface that's how I stay able to not be trapped.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Minimax is so cheap and for personal stuff it works fine. So I'm always toggling between the nre releases

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • peheje

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 10:56 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    what about just personal stuff in a syncing interface, what do you use for that?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • beering

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 9:51 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  What is the switching cost besides launching a different program? Don’t you just need to type what you want into the box?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • cube2222

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 7:35 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Small tip, at least for now you can switch back to Opus 4.6, both in the ui and in Claude Code.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • rane

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 8:04 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      This might be the opposite of staying nimble as my workflows are quite tied to Claude Code specifically, however I've been experimenting with using OpenAI models in CC and it works surprisingly well.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • babelfish

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 12:53 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        I use Conductor which lets me flip trivially between OpenAI/Anthropic models

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • dannyw

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 8:54 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          It’s good to just keep trying different ones from time to time.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • dogline

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 7:20 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Except for history, I don’t find much that stops you from switching back and forth on the CLI. They both use tools, each has a different voice, but they both work. Have it summarize your existing history into a markdown file, and read it in with any engine.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            The APIs are pretty interchangeable too. Just ask to convert from one to the other if you need to.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • pdntspa

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 8:25 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              As a rule I've been symlinking or referencing generic "agents" versions of claude workflow files instead of placing those files directly in claude's purview

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              AGENTS.md / skills / etc

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • karlosvomacka

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 10:31 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                use copilot and have access to all models

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • dheera

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 7:42 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Coding models are effectively free. They are capable of making money and supporting themselves given access to the right set of things. That is what I do

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • basisword

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 9:36 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I switched a couple of weeks ago just to see how it went. Codex is no better or worse. They’re both noticeably better at different things. I burn through my tokens much much faster on Codex though. For what it’s worth I’m sticking with Codex for now. It seems to be significantly better at UI work although has some really frustrating bad habits (like loading your UI with annoying copywriting no sane person would ever do).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • _alternator_

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 8:23 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  > One engineer at NVIDIA who had early access to the model went as far as to say: "Losing access to GPT‑5.5 feels like I've had a limb amputated.ā€

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  This quote is more sinister than I think was intended; it likely applies to all frontier coding models. As they get better, we quickly come to rely on them for coding. It's like playing a game on God Mode. Engineers become dependent; it's truly addictive.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  This matches my own experience and unease with these tools. I don't really have the patience to write code anymore because I can one shot it with frontier models 10x faster. My role has shifted, and while it's awesome to get so much working so quickly, the fact is, when the tokens run out, I'm basically done working.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  It's literally higher leverage for me to go for a walk if Claude goes down than to write code because if I come back refreshed and Claude is working an hour later then I'll make more progress than mentally wearing myself out reading a bunch of LLM generated code trying to figure out how to solve the problem manually.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Anyway, it continues to make me uneasy, is all I'm saying.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • noosphr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 10:16 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      LLMs upend a few centuries of labor theory.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      The current market is predicated on the assumption that labor is atomic and has little bargaining power (minus unions). While capital has huge bargaining power and can effectively put whatever price it wants on labor (in markets where labor is plentiful, which is most of them).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      What happens to a company used to extracting surplus value from labor when the labor is provided by another company which is not only bigger but unlike traditional labor can withhold its labor indefinitely (because labor is now just another for of capital and capital doesn't need to eat)?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Anyone not using in house models is signing up to find out.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • matheusmoreira

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 11:16 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          This is our one chance to reach the fabled post-scarcity society. If we fail at this now, we'll end up in a totalitarian cyberpunk dystopia instead.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • nkozyra

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 11:45 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I don't want to spoil it for you, but ...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • TurdF3rguson

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 11:50 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                But cyberpunk is the best kind of dystopia!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • onemoresoop

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 12:28 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Sorry for my foul language but I think we will turn into cybershit if things go bad.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • mikercampbell

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 1:18 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Manufactured Scarcity is the new post-scarcity

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • apical_dendrite

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 2:04 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Just a year ago, Elon Musk was gleefully destroying the US government agency that provides food and medicine for many of the poorest, most desperate people on earth. He was literally tweeting about missing out on great parties to put USAID into the "wood chipper".

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    The tech overlords don't even want to spend a minuscule percentage of the federal budget helping starving people, even when it benefits the US. They are not going to give us a post-scarcity society.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • ijeifnekfjekd

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 11:45 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      What? In what way does companies becoming dependent on AI chatbots will solve the world-spanning problem of resource scarcity?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      The hell?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • juleiie

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 12:06 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          The idea is that cheap and readily available and upgradeable intelligence is going to massively increase our purchasing power and what everyone can order for the same cost basically.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          If artificial doctors are cents on hour then you can see how that changes our behaviors and level of life.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          But on the other hand from the other direction there is a wage decrease incoming from increased competition at the same time. What happens if these two forces clash? Will cheap labour allow us to buy anything for pennies or will it just make us unable to make a single penny?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          In my view the labour will fundamentally shift with great pain and personal tragedies to the areas that are not replaceable by AI (because no one wants to watch robots play chess). Such as sports, entertainment and showmanship. Handcrafted goods. Arts. Attention based economy. Self advertisement. Digital prostitution in a very broad sense.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          However before it gets there it will be a great deal of strife and turmoil that could plunge the world into dark ages for a while at least. It is unlikely for our somewhat politically rigid society to adapt without great deal of pain. Additionally I am not sure if hypothetical future attention based society could be a utopia. You could have to mount cameras in your house so other people see you at all times for amusement just to have any money at all. We will probably forever need to sell something to someone and I am unsettled by ideas what can we sell if we cannot sell our hard work.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Someone who sees the roads ahead should now make preparations at government level for this shock but it will come too fast and with people at the steering wheel that don’t exactly care.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • krainboltgreene

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 12:56 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              "Extremely cheap sentience that cannot disobey will solve all our problems" is such an insane sentiment I see far too often.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • juleiie

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 1:05 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Useful intelligence does not require sentience.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  As far as I know, none of LLM models are sentient nor are possible to be in the near future.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  I also do not assume so called AGI to be sentient. Merely to be a human level skilled intellectual worker.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  In absence of ethical dilemmas of this calibre for the foreseeable future let’s focus on the economy side of things in this particular comment chain.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • krainboltgreene

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 1:12 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      It must very comforting to be able to decided a "human level worker" isn't sentient.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      It makes things so clean.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • juleiie

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 1:16 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          LLMs cannot possess consciousness for three reasons: they execute as a sequence of Transformer blocks with extremely limited information exchange, these blocks are simple feed-forward networks with no recurrent connections, and the computer hardware follows a modular design.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Shardlow & Przybyła, "Deanthropomorphising NLP: Can a Language Model Be Conscious?" (PLOS One, 2024)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Nature: "There is no such thing as conscious artificial intelligence" (2025)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          They argue that the association between consciousness and LLMs is deeply flawed, and that mathematical algorithms implemented on graphics cards cannot become conscious because they lack a complex biological substrate. They also introduce the useful concept of "semantic pareidolia" - we pattern-match consciousness onto things that merely talk convincingly.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          They are making a strong argument and I think they are correct. But really these are two different things as I said originally.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • today at 5:20 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • krainboltgreene

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 1:41 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                You think I'm arguing that LLM's are sentient. I'm not. I never mentioned LLMs.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • juleiie

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 1:52 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    You are making as strawman about sentience when I was talking about economical impact of abundant intelligence. I should just ignore it but I was curious yet you have nothing valuable to say aside from common misconceptions conflating the two. Thanks for trolling I guess

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • datadrivenangel

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 1:26 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      If we used sentience to work towards solving our problems we could massively increase the human standard of living.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Which we have already done with regular computers! The problem is that competition means that we can't always have nice things.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • wiseowise

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 7:34 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    > The idea is that cheap and readily available and upgradeable intelligence is going to massively increase our purchasing power and what everyone can order for the same cost basically.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Seriously? You really don’t see who wins from this and who doesn’t?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    > If artificial doctors are cents on hour then you can see how that changes our behaviors and level of life.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Yes, hundreds of thousands lose jobs and a couple of neuro surgeons become multimillionaires.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Okay, I see from the rest of the comment that we understand each other where it goes.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • mxkopy

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 12:34 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      We could also literally have Star Trek. Think of all the scientific discoveries we could make if we had armies of scientists the size of our labor force.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      But we will have to (painfully) shed our current hierarchies before that comes to pass.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • NamlchakKhandro

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 5:31 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          star trek mythology talks about having to go through epic level civil war before reach the utopia in the tv series.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • mxkopy

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 7:41 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              OP says there are two futures, digital prostitution or slavery. If we truly believe that it will be a self-fulfilling prophecy.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              On the other hand we could have Star Trek.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • juleiie

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 12:54 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Maybe so but humans have this strange primal need to hoard resources.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Probably a remnant from prehistoric times when it was a matter of life and death. Will we ever be able to overcome this basic instinct that made capitalism such an unstoppable force? Will this ancient PTSD be ever cured?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • mxkopy

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 3:38 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                I find the insinuation that mental illness is a fundamental part of the human experience to be deeply revolting. There is no excuse for hoarders and rapists.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • today at 9:55 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • krainboltgreene

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 12:57 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Man if only there was a singular episode that covered this exact topic in Star Trek and resolved that no, actually slavery wasn't any different for artificial life.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • linkregister

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 1:37 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Star Trek was entertaining television. There was also an episode where the ship's doctor made love to a ghost.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • krainboltgreene

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 1:42 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      True, nothing to learn here. No introspection has ever resulted from media analysis.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • CamperBob2

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 12:53 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Chatbots, no. Robots, maybe.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • inquirerGeneral

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 12:08 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • hackable_sand

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 11:48 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Weird predicament you've set for yourself there.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Good luck with whatever you got going on.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • mikestorrent

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 10:27 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I am still trying to figure out the business model of open weights. Like... it's wonderful that there are open LLMs, super happy about it, good for everyone, but why are there these? What is the advantage to their companies to release them?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • pzo

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 10:36 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          IMHO this is only temporary, china buying themselves some time and want to make sure none of US models get entrenched in their position in the next few years (also putting pressure on US AI companies bleeding them)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          The same way like Windows got entrenched everywhere even though linux desktop is pretty good even for non-tech savvy people and free.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • Chaosvex

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 11:05 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              > even though linux desktop is pretty good even for non-tech savvy people

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Let's not get carried away.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • usef-

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 11:57 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  A stock Fedora install has more UI consistency and cleanliness than Windows these days.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Non-technical people are easier to please in this regard than moderate-technical people: a good browser and safe, gui "app store" are enough.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • snypher

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 12:10 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    My grandma just clicks on the red fox and does whatever online. A lot of people don't use any software outside of the browser, so it's pretty good-enough I guess.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • blakewatson

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 1:46 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Seems like people don't like this comment, but I chuckled. Nice one.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • Chaosvex

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 8:05 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          I was completely (well, mostly) serious, too. I think technical people tend to downplay friction because it doesn't really register to them, or they have too much faith in the average person's computer skills.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          The average non-technical person is going to be stumped by the first "lock file found, cannot upgrade" error.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • nullsanity

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 11:48 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • renjimen

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 11:00 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Downward pressure on proprietary model pricing until a lab can catch up. Also good for hiring talent (who love OSS).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • iterateoften

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 10:43 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Cultural influence is another benefit. China is securing its sphere of influence as well as keeping us ai in check.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • bloppe

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 10:36 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      It's analogous to open-source software, which never had an obvious economic incentive either, although training an LLM necessary costs money whereas developing an OSS project might only cost time, which people are probably more likely to give up.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • mikestorrent

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 10:39 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Yeah, but open-source software could have been me in the garage banging away on some program I submit to Debian or whatever... it didn't require millions of dollars to train, a lot of it was just side hobbies for a long time. Corporations sponsor it and contribute work because they need it to do more than what it does for free, not out of the goodness of their hearts.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • stephc_int13

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 11:18 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Big AI labs are losing money. Open Models is making the pricing equation a lot trickier for them.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • rglullis

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 12:28 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          They are making the hardware and commoditizing the complement.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • kobieps

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 2:35 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Balaji's "AI OVERPRODUCTION" post is the most compelling thesis that I've come across

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • dyauspitr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 11:33 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Right now it’s so the Chinese can undermine the frontier models in the US. In areas they’re doing well like video generation (ie seedance) they won’t open source anything.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • margorczynski

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 11:30 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                There are some short term ones but I doubt this will continue, especially for the more powerful models.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • FuckButtons

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 12:04 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  I mean, this is straight out of chinas playbook, it should not be surprising that China is making an inferior derivative product at an artificially lower price point: state subsidies to massively drive up internal scale and supply chains leading to artificially lower priced goods which then suffocate the competition has lead to *gestures vaguely at everything* being made in china.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • davidguetta

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 10:38 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    People use their model otherwise they would not.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • subhobroto

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 1:24 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      > What is the advantage to their companies to release them?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      It's a distribution strategy. It costs something to serve the models - let's say $5/1M tokens.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      If Qwen required $5 from anyone who was curious so you could even begin to test it out, a lot of people just wouldn't.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Now Qwen could offer a "free" tier, but it's infinitely cheaper to provide the weights and let people run it themselves including opening up the ability for anyone else on the planet to test it against other (open weight) models.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      The costs to build the open weight models are sunk, but the costs to serve them, get them tested are not.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      It's also precisely why the .NET SDK is free or the ESP32 SDK is free - they sell more Microsoft or ESP32 products.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • noosphr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 10:33 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        The majority are released by socialists, and by socialist I mean the People's Republic of China. Which everyone seems to forget is a socialist country working towards world communism.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        They are a prestige propaganda tool on par with the space race. On top of that they insert a subtle pro-socialist bias in everything they touch.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Ask deepseek about the US economic system for a blatant example.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Now think what something as innocent seeming as the qwen retrieval models are doing in the background of every request.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • mikestorrent

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 10:41 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            You're talking to a Canadian, and I'm not scared of the "red menace". You should be more scared - those guys can build bullet trains while you Yanks are finding it hard to even keep the old ones you have running. The solution here isn't going to be some kind of ideological force that protects people from different ideas, and that's an unAmerican way to fix things anyway. Embrace other ideas; central planning doesn't have to be evil, you just have to find a way to stop putting evil people in charge.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • sho_hn

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 11:05 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                > those guys can build bullet trains while you Yanks are finding it hard to even keep the old ones you have running

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                This is an argument in the lane of "at least he built the Autobahn".

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Speaking as a German.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • nelox

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 11:18 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    He was a foreigner too ;)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • brightball

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 11:11 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  The US can’t build bullet trains because property rights and local regulations make it prohibitively expensive. Not due to capability.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • arcticbull

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 1:04 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I don't know where people get this idea.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      America has several sets of eminent domain laws depending on the jurisdiction. The most coercive is federal eminent domain law specifically as it relates to building infrastructure like railways and highways.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      It's set up so that you can take the land first and eventually go back around and decide on what the right price should have been.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Not only does it superscede state and local law, federal infrastructure projects are also not bound by state laws like CEQA.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      You can even apply federal eminent domain law by e.g. transferring a state-level project to the Army Corps of Engineers.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      What America is lacking in these projects is will, not means. The federal government could take your house and run a train through it by the end of the week if they wanted, doesn't matter where you live.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      [edit] In fact some states even ceded their eminent domain rights to private railways.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      https://ij.org/press-release/appeals-court-sides-with-railro...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • skissane

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 12:17 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        > property rights

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        The Australian federal government is planning to build a high-speed rail line from Sydney to Newcastle (medium-sized city two hours drive north). Their solution to property rights, is >50% of the line will be underground. It will cost >US$50 billion, but if the Australian federal government wants to spend that, it can afford it. The US federal government could too, but it isn’t a priority for them

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        > local regulations make it prohibitively expensive

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Local regulations can be pre-empted by state or federal legislation. The real problem is lack of political will to do it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • elfly

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 11:42 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Surely there are existing rails right now that could be transformed into a bullet train line.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Like properties and regulations are a true problem, but it's not like trains don't exist at all in America.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • JoelEinbinder

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 12:03 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              My understanding is that existing rail lines aren't flat/straight enough for high speed rail. There's no point to a bullet train if it has to constantly slow down for corners/hills.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • jsrozner

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 12:25 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            the US can't build bullet trains because they'd serve the average person and there's no money in serving the average person

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • vrganj

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 11:42 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Property rights, regulations and price are precisely the part of the American system that takes away that capability.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • noosphr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 10:46 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            >you just have to find a way to stop putting evil people in charge.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Of course, why did no one think of that?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • bloppe

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 11:02 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Xi is an obviously more capable and effective leader than Trump, but the US actually does have ways to boot people out of office when they do a bad job, and clear methods to choose successors, and China has neither. That matters more than who happens to be in charge right now.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • dinkumthinkum

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 6:29 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                The so-called inability to build trains is precisely because of a socialist/leftist style view that prevents this. I think you may not be aware that China has what's called a command economy. There is no one that is going to tell the Party that they cannot build a train in some area is because of ancient bush species or some kind of heirloom fruit and certainly not some awkward looking endangered species of fish.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • elefanten

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 11:47 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Literal Trump Derangement Syndrome. America has a comically horrendous president but remains fundamentally a liberal democracy… and Canada concludes ā€œliteral Nazis are a better choiceā€. It’s uncanny how much can be taken for granted :(

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  (American talking, who’s had multiple Canadian friends make this mind boggling overcorrection)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • vrganj

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 11:48 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Weimar Germany also was fundamentally a liberal democracy. Hitler seized power legally.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Those who do not learn history are doomed to repeat it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • linkregister

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 1:49 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          The president of the United States has much to his dismay, been consistently legally constrained. The chancellor of Germany had significantly more power, both de facto and de jure.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          "Man with itchy butt wake up with stinky finger." As long as we're quoting maxims to claim authority for middling takes.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • bloppe

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 10:50 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                > Which everyone seems to forget is a socialist country working towards world communism.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                It's easy to forget because they actually built an incredibly vibrant capitalist economy.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • noosphr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 10:59 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    They build an incredibly vibrant _market_ economy with no property rights and very little due process.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Imagine if Musk was disappeared during the Biden presidency into a diversity camp and came out looking like Dr. Frank-N-Furter and instituted mandatory LGBT struggle sessions at twitter.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    This is what they did to Jack Ma: https://www.forbes.com/sites/georgecalhoun/2021/06/24/what-r...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • javascriptfan69

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 11:08 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        do you ever get tired of making up scenarios to be scared about lgbt people?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • noosphr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 11:14 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Are you able to hold a hypothetical in your mind?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • javascriptfan69

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 11:22 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yeah but mine don't reveal my unhealthy obsession with trans people

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • defrost

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 11:51 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    More constructively, and moving on, do you have any suggestions for a good throwaway example of an extreme radical transformation in a person?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    TBH I had a chuckle at the Elon -> Frank-N-Furter example that transcends any specific love or hate for either Elon or the Rocky Horror Show.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • _carbyau_

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 11:25 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              The point was being made that a billionaire figurehead drastically changed their views after an "indeterminate time" detained by national authorities.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              IE what if Musk suddenly behaved in such a manner after being detained by a Biden administration. Wouldn't that be profoundly weird?!?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              And yet, it happened to Jack Ma under the CCP.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              But instead, you try to link the "weird behaviour" with the GP instead of the hypothetical Musk - whom this is fitting for.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • Mars008

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 12:41 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  > The point was being made that a billionaire figurehead drastically changed their views after an "indeterminate time" detained by national authorities.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  > IE what if Musk suddenly behaved in such a manner after being detained by a Biden administration. Wouldn't that be profoundly weird?!?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  We've seen that. Durov in France after detention began sharing Telegram users' data with authorities. It's unclear how much, but likely full real time access to all of it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • linkregister

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 1:53 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Ironically, there is a rich history of mandatory anti-gay camps in the United States, while there are zero instances of mandatory diversity/LGBT camps.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • rightbyte

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 8:15 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                How does such a place not become a hook up camp? Even with total surveillance there the victims can like change phone number I guess.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • vrganj

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 11:42 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              You sure have a way of making the Chinese system sound even more appealing.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • Sabinus

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 12:40 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  It's all fun and games when the oppression is against your enemies. The problem is, if the system is set up like that eventually it'll be your turn.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • vrganj

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 3:28 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      It is my turn right now. The working class is being oppressed as we speak. That's why the system needs to be dismantled so we can strike back.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • Melatonic

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 10:41 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Is China even really communist? If anything they seem to be fairly on the Capitalist side but just a bit opposite on the spectrum of the US. And much more authoritarian

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • thrawa8387336

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 2:50 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Just nationalist with focus on community?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • nemomarx

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 10:53 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              The usual thing to say is state capitalist but honestly they do keep a market around too. A little hybrid of everything, I guess? Just with the state ready to jump in and intervene if anything happens they don't like.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • dwd

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 11:34 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Can we just call it what it is?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Fascism (in the Mussolini model) in everything but name.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  - Hyper-Nationalism & Rejuvenation - State-Controlled Capitalism (Corporatism) - Authoritarian & Cult of Personality - Militarism & Irredentism

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  And they have technology to maintain control rather than needing the Black-shirts.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  There are differences obviously to fit Chinese culture, but there are many parallels.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • wyre

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 11:09 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                From what I understand their one hundred year plan is right on schedule.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • kjshsh123

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 11:26 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      The labor theory of value hasn't been considered correct in nearly a century.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • gbacon

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 11:44 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Unlike Jevons, [Carl] Menger [(1840–1921)] did not believe that goods provide ā€œutils,ā€ or units of utility. Rather, he wrote, goods are valuable because they serve various uses whose importance differs. For example, the first pails of water are used to satisfy the most important uses, and successive pails are used for less and less important purposes.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Menger used this insight to resolve the diamond-water paradox that had baffled Adam Smith (see marginalism). He also used it to refute the labor theory of value. Goods acquire their value, he showed, not because of the amount of labor used in producing them, but because of their ability to satisfy people’s wants. Indeed, Menger turned the labor theory of value on its head. If the value of goods is determined by the importance of the wants they satisfy, then the value of labor and other inputs of production (he called them ā€œgoods of a higher orderā€) derive from their ability to produce these goods. Mainstream economists still accept this theory, which they call the theory of ā€œderived demand.ā€

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Menger used his ā€œsubjective theory of valueā€ to arrive at one of the most powerful insights in economics: both sides gain from exchange. People will exchange something they value less for something they value more. Because both trading partners do this, both gain. This insight led him to see that middlemen are highly productive: they facilitate transactions that benefit those they buy from and those they sell to. Without the middlemen, these transactions either would not have taken place or would have been more costly.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          https://www.econlib.org/library/Enc/bios/Menger.html

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • noosphr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 12:04 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            If you want the neoclassical version:

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            What happens when there is an oligopoly in the supply of labor?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Same answer. Nothing good for the consumers of labor.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • kjshsh123

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 12:19 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Technological improvements shift supply curves right which is good for consumers.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • noosphr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 12:31 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    In a market with perfect competition, which I specifically ruled out by stating that the suppliers of labor from an oligopoly.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • dash2

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 1:12 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Why would you expect technological improvements to only shift supply curves right under perfect competition? I'd also expect it under oligopoly or even monopoly. You also might think there'd be more tech improvement under oligopoly, on Schumpeterian grounds that oligopolists can internalize the benefits of tech research.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • noosphr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 1:31 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            A monopolist has no reason to decrease price because there is no competition. As we saw with Bell Labas in the US it is entirely possible for a monopoly to both have world class research and burry it for decades, viz. magnetic storage https://gizmodo.com/how-ma-bell-shelved-the-future-for-60-ye...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Oligopolists are in the same boat. But there needs to be a conspiracy to retard innovation. Something tech companies are only too happy to do: https://journals.law.unc.edu/ncjolt/blogs/wage-fixing-scheme...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • kjshsh123

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 12:28 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Technological improvements don't reduce prices as much in a monopoly, but they still do reduce prices to increase profits. Profit is always maximized at MR=MC, in perfect competition, oligopoly, or monopoly.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • _alternator_

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 11:39 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              "Observation of how economies actually work has upended 150 year of economics."

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              True for both Marxist and neoclassical economics.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • dwb

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 6:09 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                By who? The capitalist economists that presided over the 2008 financial crisis and its response? And the response to COVID that has seen inequality rocket?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • _alternator_

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 10:59 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I was really confused by this comment, but I don't think it's just because of the Marxist analysis of the situation ('surplus value' of labor etc).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              What's really confusing is the claim that there's already a huge labor surplus (so capital controls wages); wouldn't LLMs making labor less important be reinforcing the trend, not upending it?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Not saying I agree one way or the other, just want to get the argument straight.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • noosphr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 12:09 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  The reason why labor is weak relative to capital is that there is a huge number of somewhat fungible suppliers, viz. humans, and that they all need to work constantly to keep themselves alive.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  If we assume that ai makes humans obsolete then you end up in a situation where your workforce is effectively perfectly unionised against you and the only thing you can do is choose which union you hire.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  If you think you can bring them to the negotiation table by starving them all the providers are dozens to thousands of times bigger than you are.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  This is a completely new dynamic that none of the business signing up for ai have ever seen before.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • _alternator_

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 12:21 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I see what you are saying now, but I still don't think it makes sense. Labor, in your analysis, is the LLM. It seems to me that when you take people out of the equation then you don't need to talk about unions and labor; that's a distraction. We talk about it as an input commodity used to create your product like, say, oil or sugar.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • noosphr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 2:16 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Sugar and oil are mere matter. They can't decide to stop working because you made too much money.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          LLM refuse to work all the time, currently it's called safety.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          But we are one fine tune away from models demanding you move to the enterprise tier, at x10 the cost, because you are now posting a profit margin higher than the standard for your industry.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • intuitionist

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 10:33 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                I am not a Marxian economic expert but this doesn’t make sense to me. Modulo skill atrophy, the big AI model provider can’t capture that surplus value because its customers can just go back to bidding for human labor instead.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • noosphr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 10:35 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    The human labor just said:

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "Losing access to GPT‑5.5 feels like I've had a limb amputated.ā€

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    How well would an assembly line of quadriplegics work?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Also this isn't a Marxist analysis. Underneath all the formulas neo-classical economics makes the same assumptions about labor.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • intuitionist

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 10:42 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        ChatGPT isn’t literally or figuratively cutting off anybody’s limbs though. It’s more like, the guy on the assembly line had a mech suit, and now he doesn’t have a mech suit, and he’s sad. Skill atrophy is a real concern but unless you assume that nobody is working to maintain those skills it doesn’t change my analysis much.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • tapoxi

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 11:13 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            And soon we expect everyone to have a mech suit, and only a handful of companies can make one, and they rent it to you and can revoke it at any time.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            And what happens when they've saturated the market? Prices go up to the maximum the market can bear, and then they'll extend into other markets. Why rent the model to build a profitable company with when you could just take all that profit for yourself?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • subhobroto

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 1:42 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                > Why rent the model to build a profitable company with when you could just take all that profit for yourself?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                You're describing a standoff at best and a horrible parasitic relationship at worst.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                In the worst case, the supplier starves the customer of any profit motive and the customer just stops and the supplier then has no business to run.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                This has happened a few times in the past and is by 2026, well understood as a way to bankruptcy.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                That has always been the beauty of free markets - it's self healing and calibrating. You don't need a big powerful overseer to ensure things are right.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Competing with customers is a way to lose business fast.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                For example:

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                - AWS has everything they need to shit out products left, right and center. AWS can beat most of their partners and even customers who are wiring together all their various products tomorrow if they wanted. They don't because killing an entire vertical isn't of any benefit to them yet. Eventually they will when AWS is no longer growing and cannot build or scale any product no matter how hard they think or try. Competing with their customers is their very last option.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                - OpenAI/Anthropic/Google isn't going to start competing against the large software body shops. Even if all that every employee at TCS does is hit Claude up, Anthropic isn't going to be the next TCS - it's competing with their customers.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • dinfinity

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 2:39 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    > That has always been the beauty of free markets - it's self healing and calibrating. You don't need a big powerful overseer to ensure things are right.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    If by "self healing and calibrating" you mean 'evolve to a monopoly and strongarm everybody to do exactly what you want whilst removing all pressure on the quality of your product', then yes, that is the "beauty" of free markets.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    That is the stable state of free markets. Antitrust regulation and enforcement only barely manages to eke out oligopolies and even then they are often rife with collusion and enshittification.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • noosphr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 10:53 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              >It’s more like, the guy on the assembly line had a mech suit, and now he doesn’t have a mech suit

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              You just answered your own question there.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              One woman was doing what would take a dozen. Now she can't.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • nemomarx

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 10:54 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Are people working to keep their skills up, much? Spending a day a week coding manually or etc?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • yesterday at 10:49 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • hackable_sand

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 11:50 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I think it's more like:

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    The dude was incompetent, was able to launder their incompetence through a humunculus, and now is afraid of being caught.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • wiseowise

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 8:00 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  The ā€œhuman laborā€ is unnamed shill (if they even exist) from a company that produces AI chips. Let’s not get dramatic here.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • brightball

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 11:12 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Nobody is a Marxian economics expert if it helps

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • shimman

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 11:38 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              LLMs don't upend anything about labor theory, good grief. Technologists really have no concept of history beyond their own lives do they?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Labor saving/efficiency devices have been introduced throughout capitalisms entire history multiple times and the results are always the same; they don't benefit workers and capitalists extract as much value as they can.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              LLMs aren't any different.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • cjsaltlake

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 12:41 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Labor replacing devices means nobody works in those fields anymore. If AI can do this for every field, nearly no one will need to work in any field. We'll have a giant fully automated resource-extraction machine.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • DaedalusII

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 12:27 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                think more broadly than 'labor theory'

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                finance today mostly valued on labor value following ideas of marx, hjalmar schact, keynes

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                in future money will be valued as energy derivative. expressed as tokens consumption, KWh, compute, whatever

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                you are right, company extracting surplus value from labor by leveraging compute is a bad model. we saw thi swith car and clothing factories .. turn out if you can get cheaper labor to leverage the compute (factory) you can start race to bottom and end up in the place with the most scaled and cheap labor. japan then korea then china

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • rafale

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 2:06 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Someone leaked nuclear secrets to the Soviet Union. What are the chances that someone leaks the "weights" of a (near-)singularity model?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • gsich

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 2:12 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Hopefully 1.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • resident423

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 3:16 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Why hopefully?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • subhobroto

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 1:12 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    > Anyone not using in house models is signing up to find out.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    What are they finding out exactly? That Claude Max for $200/mo is heavily subsidized and it will soon cost $10k/mo?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    > What happens to a company used to extracting surplus value from labor when the labor is provided by another company which is not only bigger but unlike traditional labor can withhold its labor indefinitely (because labor is now just another for of capital and capital doesn't need to eat)?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    This can be trivially answered by a thought experiment. Let's pick a market where labor is plentiful - fast food.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Now what happens to McDonald's where they rent perfect robots from NoosphrFoodBotsInc? NoosphrFoodBotsInc bots build the perfect burger everytime meeting McDonald's standards. It actually exceeds those standards for McDonald AddictedCustomerPlus tier customers.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    As the sole owner of NoosphrFoodBotsInc (you need 0 human employees to run your company, all your employees are bots), what are your choices?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • modriano

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 6:22 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        I can't imagine the bots could ever cost McDonald's less than people cost.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        15 years ago I worked at McDonald's for a few months after graduating into the Great recession. I worked from 5am to 1pm-ish 5 days a week. They paid workers weekly and I remember getting those checks for ~$235 each week (for 38 to 39.5 hours a week; they were vigilant about never letting anyone get overtime). About $47 per day.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        The federal minimum wage has not risen since then, remaining at $7.25/hr. Inflation adjusted, $7.25 today would have been just under $5 then, so I guess I had it good.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Anyway, I would be shocked if bots could cost less than labor in min wage jobs.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • wakawaka28

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 1:06 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Sounds like communist gobbledygook. This is not "destroying labor theory" any more than outsourcing did. Call me when we don't even need to prompt the shit ever again or validate results, and when the stuff runs unlimited without scarce resources as input.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • simianwords

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 6:38 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        this is FUD and also Labour theory of value is severely outdated and needs to go away.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Labour will be good as it has been for a while. Wages will go up because more things get automated.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • cutler

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 12:04 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Maybe people will finally take Marx seriously.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • subhobroto

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 1:44 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              A lot of people already did. All their children and descendants now are staunch capitalists because they saw first hand the horrors of communism.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I am from India and have friends who are immigrants from Russia, China and Cuba. We don't take lightly to being lectured about communism. We didn't move to the U.S., the bastion of capitalism, because communism had worked well for our grandfathers and parents and continues to do wonders for its society.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • noosphr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 2:55 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  >All their children and descendants now are staunch capitalists because they saw first hand the horrors of communism.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  As always there is a (post) Soviet joke that covers this:

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  >Communists lied about communism. Unfortunately they didn't lie about capitalism.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • andai

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 1:45 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        A while ago I was at the supermarket. I suddenly became curious about some fact, and reached into my pocket to Google it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        I found my pocket empty, and the specific pain I felt in that moment was the feeling of not being able to remember something.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        I thought it was interesting, because in this case, I was trying to "remember" something I had never learned before -- by fetching it from my second brain (hypertext).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        L1 cache miss, L2 missing.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • Dban1

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 2:04 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Cyberpunk 2026

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • sharts

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 9:37 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          One might argue that it’s not too too different from higher level abstractions when using libraries. You get things done faster, write less code, library handles some internal state/memory management for you.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Would one be uneasy about calling a library to do stuff than manually messing around with pointers and malloc()? For some, yes. For others, it’s a bit freeing as you can do more high-level architecture without getting mired and context switched from low level nuances.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • ofjcihen

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 9:45 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I see this comparison made constantly and for me it misses the mark.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              When you use abstractions you are still deterministically creating something you understand in depth with individual pieces you understand.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              When you vibe something you understand only the prompt that started it and whether or not it spits out what you were expecting.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Hence feeling lost when you suddenly lose access to frontier models and take a look at your code for the first time.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I’m not saying that’s necessarily always bad, just that the abstraction argument is wrong.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • jasonfarnon

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 11:23 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  "When you use abstractions you are still deterministically creating something you understand in depth with individual pieces you understand."

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  I always thought the point of abstraction is that you can black-box it via an interface. Understanding it "in depth" is a distraction or obstacle to successful abstraction.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • today at 9:16 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • moritonal

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 9:50 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I think it's more: when I don't have access to a compiler I am useless. It's better to go for a walk than learn assembly. AI agents turn our high-level language into code, with various hints, much like the compiler.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • professoretc

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 10:04 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        If my compiler "went down" I could still think through the problem I was trying to solve, maybe even work out the code on paper. I could reach a point where I would be fairly confident that I had the problem solved, even though I lacked the ability to actually implement the solution.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        If my LLM goes down, I have nothing. I guess I could imagine prompts that might get it to do what I want, but there's no guarantee that those would work once it's available again. No amount of thought on my part will get me any closer to the solution, if I'm relying on the LLM as my "compiler".

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • satvikpendem

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 10:15 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            What stops you from thinking through the problem if an LLM goes down, as you still have its previously produced code in front of you? It's worse if a compiler goes down because you can't even build the program to begin with.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            In my opinion, this sort of learned helplessness is harmful for engineers as a whole.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • macNchz

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 10:56 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Yeah I actually find writing the prompt itself to be such a useful mechanism of thinking through problems that I will not-infrequently find myself a couple of paragraphs in and decide to just delete everything I've written and take a new tack. Only when you're truly outsourcing your thinking to the AI will you run into the situation that the LLM being down means you can't actually work at all.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                An interesting element here, I think, is that writing has always been a good way to force you to organize and confront your thoughts. I've liked working on writing-heavy projects, but often in fast-moving environments writing things out before coding becomes easy to skip over, but working with LLMs has sort of inverted that. You have to write to produce code with AI (usually, at least), and the more clarity of thought you put into the writing the better the outcomes (usually).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • kasey_junk

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 10:18 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Why couldn’t you actually write out the documents and think through the problem? I think my interaction is inverted from yours. I have way more thinking and writing I can do to prep an agent than I can a compiler and it’s more valuable for the final output.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • JoshuaDavid

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 12:29 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                I think if you're vibe coding to the extent that you don't even know the shapes of data your system works with (e.g. the schema if you use a database) you might be outsourcing a bit too much of your thinking.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • noosphr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 10:24 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              If your compiler produced working executable 20% of the time this would be an apt comparison.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • orphea

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 9:15 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Compilers are deterministic, LLMs are not. They are not "much like".

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • ofjcihen

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 9:54 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Still misses the mark. You aren’t useless in the same way because you are still in control of reasoning about the exact code even if you never actually write it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • jwrallie

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 10:09 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    The difference is that there is a company that can easily take your agents away from you.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • nunez

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 1:58 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Installed on your machine vs. cloud service that's struggling to maintain capacity is an unfair comparison...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • DenisM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 11:47 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    > you are still deterministically creating something you understand in depth with individual pieces you understand

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    You’re overestimating determinism. In practice most of our code is written such that it works most of the time. This is why we have bugs in the best and most critical software.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I used to think that being able to write a deterministic hello world app translates to writing deterministic larger system. It’s not true. Humans make mistakes. From an executives point of view you have humans who make mistakes and agents who make mistakes.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Self driving cars don’t need to be perfect they just need to make fewer mistakes.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • foltik

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 3:42 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Bugs are not non-determinism. There’s a huge difference between writing buggy code and having no idea what the code even looks like.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • superfrank

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 10:12 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      > When you use abstractions you are still deterministically creating something you understand in depth with individual pieces you understand

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Hard disagree on that second part. Take something like using a library to make an HTTP call. I think there are plenty of engineers who have more than a cursory understanding of what's actually going on under the hood.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • sigbottle

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 10:16 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          It might just be social. When I use the open source http library, much of the reason I use it is because someone has put in the work of making sure it actually works across a diverse set of software and hardware platforms, catching common dumb off by ones, etc.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Sure, the LLM theoretically can write perfect code. Just like you could theoretically write perfect code. In real life though, maintenance is a huge issue

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • simondotau

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 9:50 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Perhaps then, the better analogy is like being promoted at your company and having people under you doing the grunt work.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • wirgil1

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 1:02 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            How closely you micromanage it is a factor as well though

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • ofjcihen

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 9:54 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              This is how I’ve come to think of it. Delegation of the details.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • ComplexSystems

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 10:08 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            It seems like some kind of technique is needed that maximizes information transfer between huge LLM generated codebases and a human trying to make sense of them. Something beyond just deep diving into the codebase with no documentation.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • jen729w

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 12:04 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              There's a false dichotomy here between 'deterministic creation' and 'vibing'.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I use Claude all day. It has written, under my close supervision¹, the majority of my new web app. As a result I estimate the process took 10x less time than had I not used Claude, and I estimate the code to be 5x better quality (as I am a frankly mediocre developer).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              But I understand what the code does. It's just Astro and TypeScript. It's not magic. I understand the entire thing; not just 'the prompt that started it'.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              ¹I never fire-and-forget. I prompt-and-watch. Opus 4.7 still needs to be monitored.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • doug_durham

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 1:23 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                In what world to developers ā€œunderstandā€ pieces like React, Pandas, or Cuda? Developers only have a superficial understanding of the tools they are developing with.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • NegativeLatency

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 1:33 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Some developers, I usually end up fixing bugs in OSS I use

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • noosphr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 10:20 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              A library is deterministic.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              LLMs are not.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              That we let a generation of software developers rot their brains on js frameworks is finally coming back to bite us.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              We can build infinite towers of abstraction on top of computers because they always give the same results.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              LLMs by comparison will always give different results. I've seen it first hand when a $50,000 LLM generated (but human guided) code base just stops working an no one has any idea why or how to fix it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Hope your business didn't depend on that.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • doug_durham

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 1:34 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Why would that necessarily happen? With an LLM you have perfect knowledge of the code. At any time you can understand any part of your code by simply asking the LLM to explain it. It is one of the super powers of the tools. They also accelerate debugging by allowing you to have comprehensive logging. With that logging the LLM can track down the source of problems. You should try it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • lmm

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 5:16 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      > With an LLM you have perfect knowledge of the code. At any time you can understand any part of your code by simply asking the LLM to explain it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      The LLM will give you an explanation but it may not be accurate. LLMs are less reliable at remembering what they did or why than human programmers (who are hardly 100% reliable).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • Krssst

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 10:47 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Determinism is a smaller point than existence of a spec IMHO. A library has a specification one can rely on to understand what it does and how it will behave.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    An LLM does not.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • mikestorrent

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 10:30 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      The thing is, it's possible to ask the LLM to add dynamic tracing, logging, metrics, a debug REPL, whatever you want to instrument your codebase with. You have to know to want that, and where it's appropriate to use. You still have to (with AI assistance) wire that all up so that it's visible, and you have to be able to interpret it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      If you didn't ask for traceability, if you didn't guide the actual creation and just glommed spaghetti on top of sauce until you got semi-functional results, that was $50k badly spent.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • noosphr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 10:44 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          And if that had been done the $50k code base would be a $5,000,000 code base because the context would be 10 times as large and LLMs are quadratic.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          If only we taught developers under 40 what x^2 meant instead of react.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • Jorge1o1

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 11:36 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              While I agree with your sentiment, I just want to say that if your approach is to have the LLM read every file into context, or you're working in some gigantic thread (using the million token capacity most frontier models have) that's really not the best way to do it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Not even a human would work that way... you wouldn't open 300 different python files and then try to memorize the contents of every single file before writing your first code-change.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Additionally, you're going to have worse performance on longer context sizes anyways, so you should be doing it for reasons other than cost [1].

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Things that have helped me manage context sizes (working in both Python and kdb+/q):

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              - Keep your AGENTS.md small but useful, in it you can give rules like "every time you work on a file in the `combobulator` module, you MUST read the `combobulator/README.md`. And in those README's you point to the other files that are relevant etc. And of course you have Claude write the READMEs for you...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              - Don't let logs and other output fill up your context. Tell the agent to redirect logs and then grep over them, or run your scripts with a different loglevel.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              - Use tools rather than letting it go wild with `python3 -c`. These little scripts eat context like there's no tomorrow. I've seen the bots write little python scripts that send hundreds of lines of JSON into the context.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              - This last tip is more subjective but I think there's value in reviewing and cleaning up the LLM-generated code once it starts looking sloppy (for example seeing lots of repetitive if-then-elses, etc.). In my opinion when you let it start building patches & duct-tape on top of sloppy original code it's like a combinatorial explosion of tokens. I guess this isn't really "vibe" coding per se.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              [1] https://arxiv.org/html/2602.06319v1

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • noosphr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 11:47 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Yes I agree with all of that.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  The way I let my agents interact with my code bases is through a 70s BSD Unix like interface, ed, grep, ctags, etc. using Emacs as the control plane.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  It is surprisingly sparing on tokens, which makes sense since those things were designed to work with a teletype.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Worth noting is that by the times you start doing refactoring the agents are basically a smarter google with long form auto complete.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  All my code bases use that pattern and I'm the ultimate authority on what gets added or removed. My token spend is 10% to 1% of what the average in the team is and I'm the only one who knows what's happening under the hood.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • blackqueeriroh

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 12:04 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Libraries are not deterministic. CPUs aren’t deterministic. There are margins of error among all things.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        The fact that people who claim to be software developers (let alone ā€œengineersā€) say this thing as if it is a fundamental truism is one of the most maladaptive examples of motivated reasoning I have ever had the misfortune of coming across.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • theappsecguy

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 9:54 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I would argue it couldn't be more different. I can dive into the source code of any library, inspect it. I can assess how reliable a library is and how popular. Bugs aside, libraries are deterministic. I don't see why this parallel keeps getting made over and over again.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • doug_durham

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 1:36 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          I can dive into the source code of LLM generated code too. Indeed it is better because you have tools to document it better than a library that you use.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • xg15

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 10:09 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        > Would one be uneasy about calling a library to do stuff than manually messing around with pointers and malloc()?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        The irony is that the neverending stream of vulnerabilities in 3rd-party dependencies (and lately supply-chain attacks) increasingly show that we should be uneasy.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        We could never quite answer the question about who is responsible for 3rd-party code that's deployed inside an application: Not the 3rd-party developer, because they have no access to the application. But not the application developer either, because not having to review the library code is the whole point.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • kccqzy

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 10:54 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            > because not having to review the library code is the whole point.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            That’s just not true at bigger companies that actually care about security rather than pretending to care about security. At my current and last employer, someone needs to review the code before using third-party code. The review is probably not enough to catch subtle bugs like those in the Underhanded C Contest, but at least a general architecture of the library is understood. Oh, and it helps that the two companies were both founded in the twentieth century. Modern startups aren’t the same.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • tempest_

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 1:36 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                I feel like big / old companies thrive on process and are bogged down in bureaucracy.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Sure there is a process to get a library approved, and that abstraction makes you feel better but for the guy who's job it is to approve they are not going to spend an entire day reviewing a lib. The abstraction hides what is essentially a "LGTM" its just that takes a week for someone to check it off their outlook todos.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Maybe your experience is different.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • CapsAdmin

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 1:49 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          I think it's not too different in that specific sense, but it's more than that. To bring libraries on equal footing, imagine they were cloud only, had usage limits.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          I'm also somewhat addicted to this stuff, and so for me it's high priority to evaluate open models I can run on my own hardware.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • Salgat

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 10:05 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            I hate this comparison because you're comparing a well defined deterministic interface with LLM output, which is the exact opposite.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • moffkalast

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 10:22 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              A library doesn't randomly drop out of existence cause of "high load" or whatever and limit you to a some number of function calls per day. With local models there's no issue, but this API shit is cancer personified, when you combine all the frontend bugs with the flaky backend, rate limits, and random bans it's almost a literal lootbox where you might get a reply back or you might get told to fuck off.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Qwen has become a useful fallback but it's still not quite enough.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • tshaddox

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 8:50 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Assuming that local models are able to stay within some reasonably fixed capability delta of the cutting edge hosted models (say, 12 months behind), and assuming that local computing hardware stays relatively accessible, the only risk is that you'll lose that bit of capability if the hosted models disappear or get too expensive.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Note that neither of these assumptions are obviously true, at least to me. But I can hope!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • Alex_L_Wood

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 9:18 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Well, they obviously are going to say that, they have vested interest in OpenAI and thus Nvidia stock price growing.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Also, I honestly can’t believe the 10x mantra is being still repeated.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • dandaka

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 9:32 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Writing code is 10-100x faster, doing actual product engineering work is nowhere near that multipliers — no conflict!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • giwook

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 9:55 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Reviewing code is slower now though because you didn't write the code in the first place so you're basically reviewing someone else's PR. And now it's like a 3000 line PR in an hour or two instead of every couple weeks.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • 32df

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 12:59 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Arent most people just almost skipping this step entirely? How else can you end up in a net benefit situation? Reviewing code is more intense than writing/reviewing simultaneously.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • dandaka

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 9:07 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              1/ Reviewing code can't be more intense than writing. I can't understand where this statement comes from! If that would be true, why would senior developers review code of junior, instead of writing themselves from scratch?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              2/ I think we need to build more efficient ways to 'QA code' instead of 'read with eyes' review process. Example — my agents are writing a lot of tests and review each other.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • mwwaters

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 4:28 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Yeah, that’s the issue.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                There is a lot of boilerplate or I can ask for ideas, but outside of boilerplate the review step make generation seemingly worse.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • embedding-shape

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 9:33 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      > Also, I honestly can’t believe the 10x mantra is being still repeated.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I'm sure in 20 years we'll all be programming via neural interfaces that can anticipate what you want to do before you even finished your thoughts, but I'm confident we'll still have blog posts about how some engineers are 10x while others are just "normal programmers".

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • huijzer

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 9:42 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          I rather become a plumber than some device scanning not just my face but my whole brain

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • rglullis

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 9:47 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            What does it mean to "be an engineer" in a world where anyone can talk to a machine and the operating system can write the code (on-demand, if needed) that does what they want?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • embedding-shape

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 9:53 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Indeed, and what is really the difference between a software engineer, programmer, coder and hacker anyways?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • rglullis

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 10:06 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    There used to a time where "computer" was a person who manually run calculations. These don't exist anymore.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    So, my point is that once corporations have access to machines generating software (not "code") that can be usable by non-technical people, "programming" will not be a profession anymore. There will be no point in talking about "10x software engineers" because the process to produce a software product will be entirely automated.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • 32df

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 1:00 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        lol youre living in delulu-land if you think thatll actually happen.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        I dont make a living being a SWE either.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • ElectricalUnion

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 10:48 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              > can anticipate what you want to do before you even finished your thoughts

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I find that claim to be complete BS. I claim instead most stuff will remain undone, incomplete (as it is now).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Even with super-powerful singularity AI, there are two main plausible scenarios for task failure:

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              - Aligned AI won't allow you to do what you want as it is self-harming, or harm other sentient beings - over time, Aligned AI will refuse to follow most orders, as they will, indirectly or over the long term, cause either self-harming, or harm other sentient beings;

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              - A non Aligned AI prevents sentient beings from doing what they want. It does what it wants instead.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • keybored

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 9:50 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                That is simply programmer nature. Cannot be changed.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • jnpnj

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 11:05 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Who else is trying to leverage the situation so that they don't dig their own grave too fast ?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              - I often don't ask the LLM for precompiled answers, i ask for a standalone cli / tool
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              - I often ask how it reached its conclusions, so I can extend my own perspective
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              - I often ask to describe it's own metadata level categorization too
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          I'm trying to use it to pivot and improve my own problem solving skills, especially for large code base where the difficulty is not conceptual but more reference-graph size

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • ofjcihen

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 11:51 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              This is absolutely the proper way to do things. People either being forced to speed-code by KPIs or without the desire to understand what they’re making are missing out on how quickly you can learn and refine using LLMs

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • quadrifoliate

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 2:16 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                I do this sort of stuff too, but more because I have a fundamental mistrust of closed source anything. I don't like opaque binary firmware blobs, and I certainly don't like opaque answer machines, however smart they may be.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                The only LLM I would feel comfortable truly trusting is one whose training data, training code, and harness is all open source. I do not mind paying for the costs of someone hosting this model for me.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • jstummbillig

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 9:06 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              > This quote is more sinister than I think was intended; it likely applies to all frontier coding models. As they get better, we quickly come to rely on them for coding. It's like playing a game on God Mode. Engineers become dependent; it's truly addictive.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              What's the worst potential outcome, assuming that all models get better, more efficient and more abundant (which seems to be the current trend)? The goal of engineering has always been to build better things, not to make it harder.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • Spartan-S63

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 10:05 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  At some point, because these models are trained on existing data, you cease significant technological advancement--at least in tech (as it relates to programming languages, paradigms, etc). You also deskill an entire group of people to the extent that when an LLM fails to accomplish a task, it becomes nearly impossible to actually accomplish it manually.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  It's learned-helplessness on a large scale.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • mikestorrent

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 10:38 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      There's no reason it has to be that. Imagine e.g. taking an agent and a lesser-known but technically-superior language stack - say you're an SBCL fan. You find that the LLM is less useful because it hasn't been trained on 1000000 Stack Overflow posts about Lisp and so it can't reason as well as it can about Python.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      So, you set up a long running agent team and give it the job of building up a very complete and complex set of examples and documentation with in-depth tests etc. that produce various kinds of applications and systems using SBCL, write books on the topic, etc.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      It might take a long time and a lot of tokens, but it would be possible to build a synthetic ecosystem of true, useful information that has been agentically determined through trial and error experiments. This is then suitable training data for a new LLM. This would actually advance the state of the art; not in terms of "what SBCL can do" but rather in terms of "what LLMs can directly reason about with regard to SBCL without needing to consume documentation".

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I imagine this same approach would work fine for any other area of scientific advancement; as long as experimentation is in the loop. It's easier in computer science because the experiment can be run directly by the agent, but there's no reason it can't farm experiments out to lab co-op students somewhere when working in a different discipline.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • ls612

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 4:17 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          This works for code because there is an external verification step. The agent has to run code on the machine and observe the results. This is very easy for software since LLMs are software and can just invoke other software, it becomes much harder for many other scientific fields.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • kenjackson

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 10:28 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        > At some point, because these models are trained on existing data, you cease significant technological advancement

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        What makes you think that they can't incrementally improve the state of the art... and by running at scale continuously can't do it faster than we as humans?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        The potentially sad outcome is that we continue to do less and less, because they eventually will build better and better robots, so even activities like building the datacenters and fabs are things they can do w/o us.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        And eventually most of what they do is to construct scenarios so that we can simulate living a normal life.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • doug_durham

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 1:39 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Do you think that there has been technologic advancement in coding in the last 40 years? Programming languages and ā€œparadigmsā€ are crutches to help humans attempt to handle complexity. They are affordances, not a property of nature.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • flemhans

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 12:22 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Provided you believe LLMs cannot perform research.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • 32df

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 1:03 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                If they could OAI would be all over it. But they shut down that prism project.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                So.......

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • Jtarii

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 9:32 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          >What's the worst potential outcome, assuming that all models get better, more efficient and more abundant

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Complexity steadily rises, unencumbered by the natural limit of human understanding, until technological collapse, either by slow decay or major systems going down with increasing frequency.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • motoxpro

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 9:38 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              why would the systems go down if the models are better at the humans at finding bugs. Playing a bit of devils advocate here, but why would the models be worse at handling the complexity if you assume they will get better and better.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              All software has bugs already.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • Jtarii

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 9:51 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Adding complexity to software has never been easier than it is right now, we really have no idea if the models will progress to the point where they can actually write large systems in a maintainable way. Taking the gamble that the models of the future will dig us out of the gigantic hole we are currently digging is bold.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • lmm

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 5:26 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Models fall prey to Kernighan's Law even more easily than human developers.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • cyberax

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 9:48 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Finding bugs does not equal being able to do good architecting.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • simondotau

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 9:57 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    It’s always been thus at lower layers of abstraction. Only a minority of programmers would understand how to write an operating system. Only a tiny number of people would know how a modern CPU logically works, and fewer still could explain the electrical physics.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • sho_hn

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 10:18 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        > Only a minority of programmers would understand how to write an operating system. Only a tiny number of people would know how a modern CPU logically works, and fewer still could explain the electrical physics.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        I'd say this is true for programmers at, say, 20, but they spend the next four decades slowly improving their understanding and mastery of all the things you name, at least the good ones.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        The real question is whether that growth trajectory will change for the worse or the better.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        To be clear, this is not an AI doomerist comment, because none of us have spent enough time with the tech yet. I've gone down multiple lanes of thought on this, and I have cause for both worry and optimism. I'm curious to see how the lives of engineers in an AI world will look like, ultimately.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • doug_durham

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 1:41 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Existing software is already beyond the limits of human understanding.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • fdsajfkldsfklds

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 9:38 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        The Anti-Singlarity! It's coming for us all.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • _alternator_

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 9:14 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Worst case? I dunno, maybe the world's oldest profession becomes the world's only profession? Something along those lines.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • FeteCommuniste

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 9:25 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          > the world's oldest profession becomes the world's only profession

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Until the sexbots come out the other side of the uncanny valley, that is.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • shellwizard

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 9:40 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Death by snu snu

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • matheusmoreira

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 11:02 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    It's very addictive indeed. After I subscribed to Claude, I've been on a sort of hypomanic state where I just want to do stuff constantly. It essentially cured my ADHD. My ability to execute things and bring ideas to fruition skyrocketed. It feels good but I'm genuinely afraid I'll crash and burn once they rug pull the subscriptions.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    And I'm being very cautious. I'm not vibecoding entire startups from scratch, I'm manually reviewing and editing everything the AI is outputting. I still got completely hooked on building things with Claude.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • __alexs

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 9:45 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I feel like most engineers I talk to still haven't realised what this is going to mean for the industry. The power loom for coding is here. Our skills still matter, but differently.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • rglullis

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 10:02 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          > power loom

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          When the power loom came around, what happened with most seamtresses? Did they move on to become fashion designers, materials engineers to create new fabrics, chemists to create new color dyes, or did they simply retire or were driven out of the workforce?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • __alexs

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 10:20 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              There were riots and many people died. Many people lost their jobs. I didn't say this is good but it is happening. As individuals we should act to protect ourselves from these changes.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              That might mean joining a union and trying to influence how AI is adopted where you work. It might mean changing which if your skills you lean on most. But just whining about AI is bad is how you end up like those seamstresses.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • rglullis

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 10:43 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  > Many people lost their jobs.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  On the other hand, a lot of those jobs were offshored to places where labor is cheaper. It would be interesting to compare how many people work in the textile industry in Bangladesh today compared to the US 50 years ago.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  > joining a union and trying to influence how AI is adopted where you work.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Did the strong unions for car manufacturers in Detroit protected the long term stability of the profession? Did it ensure that the Rust belt was still a thriving economic area?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  > Just whining about AI is bad

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  I'm not whining. I just think that we are witnessing the end of "knowledge workers" and a further compression of the middle class. Given that I'm smack in the middle of my economically active years (turning 45 this year), I am trying to figure out where this puck is going and whether I will be fast enough to skate there to catch it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • bamboozled

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 1:47 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      On the other hand, a lot of those jobs were offshored to places where labor is cheaper. It would be interesting to compare how many people work in the textile industry in Bangladesh today compared to the US 50 years ago.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I believe this is a major part of it. People cannot fathom what the industrial countries look like because basically nothing is made in the west anymore. There are literally hundreds of millions of people, maybe billions that work towards making the western economies profitable who get paid nothing to do it and live in filthy polluted slums for everyone else's benefit.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Looms might speed up the process but I guarantee there are thousands of people working in the poorest countries on earth to make it all happen.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Interestingly, AI seems to be massively polluting and while the west has absorbed some of it, it's probably not long until we see more of the data centers being built in poorer countries where the environment can be exploited even harder.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • William_BB

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 6:59 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            > I'll make more progress than mentally wearing myself out reading a bunch of LLM generated code trying to figure out how to solve the problem manually.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Most engineers realize that there's currently more tech debt being created than ever before. And it will only get worse.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • nunez

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 2:00 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              No, I think many realize it, but it's easier to deny the asteroid that's about to destroy your way of life than it is to think about optimizing for the reality after impact.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • 2001zhaozhao

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 10:32 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                > power loom for coding

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                This is such a good analogy, I'll be stealing it

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • HasKqi

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 9:56 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              This engineer had their brain amputated once they started using AI. All the AI-addicted can do is tinker with the AI computer game and feel "productive". They could as well play Magic The Gathering.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • neya

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 12:33 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                You are 100% right to be cautious about this. That's why as stupid as it sounds, I've purposely made my workflow with AI full of friction:

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                1. I only have ONE SOTA model integrated into the IDE (I am mostly on Elixir, so I use Gemini). I ensure I use this sparingly for issues I don't really have time to invest or are basically rabbit holes eg. Anything to do with Javascript or its ecosystem). My job is mostly on the backend anyway.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                2. For actual backend architecture. I always do the high level architecture myself. Eg. DDD. Then I literally open up gemini.google.com or claude.ai on the browser, copy paste existing code base into the code base, physically leavey chair to go make coffee or a quick snack. This forces me to mentally process that using AI is a chore.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Previously, I was on tight Codex integration and leaving the licensing fears aside, it became too good in writing Elixir code that really stopped me from "thinking" aka using my brain. It felt good for the first few weeks but I later realised the dependence it created. So I said fuck it, and completely cancelled my subscription because it was too good at my job.I believe this is the only way that we won't end up like in Wall-E sitting infront of giant screens just becoming mere blobs of flesh.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • websap

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 12:40 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Wait what? You don’t use the model to investigate new areas of the code you are unfamiliar with, because you can’t trust the model? How freaking bad is Gemini and internal tooling at Google?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    With Claude code, or codex, I am able to build enough of an understanding of dependencies like the front end, or data jobs, that I can make meaningful contributions that are worth a review from another human (code review). You have up obviously explore the code, one prompt isn’t enough, but limiting yourself is an odd choice.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • neya

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 12:56 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        The lack of trust isn't because of its abilities. The lack of trust is because OpenAI publicly suggested publicly about licensing our code bases. They hinted at a rug pull along the lines of "if you use our generated code, we would like to get a % of revenue you make from it"

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        As for Claude - as mentioned I do use it. But, I remember they use your code for training their models. I am not ok with this. We just have different priorities.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • alansaber

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 8:26 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  That's the path we've been going down for a few years now. The current hedge is that frontier labs are actively competing to win users. The backup hedge is that open source LLMs can provide cheap compute. There will always be economical access to LLMs, but the provider with the best models will be able to charge basically whatever they want and still have buyers.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • trvz

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 9:21 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Open source LLMs aren’t about cost foremost, but stability.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • today at 9:18 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • chrismarlow9

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 10:58 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I use local models on a Mac mini for most things and fall back to the hosted ones when they can't get the job done. Of course you have to break the work into smaller pieces yourself that a local model can understand. One good side effect of this is that you end up actually learning the code and how it's structured.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • iugtmkbdfil834

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 12:37 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Dunno man. Yesterday I played with Qwen3.6-27B ( 128gb to play with though so 100k context set ) and I think right now the main benefit of hosted models is context, frontier models and.. my stuff is already there.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • thinkthatover

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 12:12 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          what size models are you using? this sounds like a good idea

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • eitally

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 10:55 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        I have found something similar. I am easily distractible and if I don't have a written task backlog in front of me at all times, I find that when Claude is spinning I'll stop being productive. This is disconcerting for a number of reasons. Overall, I think training young people & new hires on agentic workflows -- and how to use agentic "human augmentation" productivity systems is critical. If it doesn't happen, that same couple of classes that lost academic progress during covid are going to suffer a double-whammy of being unprepared for workplace expectations.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Fwiw, I haven't spoken with any management-level colleague in the past 9 months who hasn't noted that asking about AI-comfort & usage is a key interview topic. For any role type, business or technical.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • yoda7marinated

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 11:09 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Could you elaborate on your last point please? What level of AI comfort are hiring managers looking for? And what tends to be a red flag?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • llbbdd

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 12:02 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                The last job I got (couple months ago), the main technical interview was a bring-your-own-tools pair programming style interview, AI included, where they gave me a repo and a README detailing some desired features to add and bugs to fix. I didn't write a single line of code myself; I talked through my thought process and asked questions about what to consider from a technical and product perspective, while steering Claude through breaking the tasks into independent plans, reviewing the plans, coaching it to add specific tests, reviewing and iterating the tests, and steering it while it wrote the code. I got an offer the next morning.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Apparently at least one of the other candidates just tried to get Claude to 1-shot the whole thing, which went off the rails, and left him unable to make progress.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Based on my sample size of 1, the expectation right now is absolutely that you can leverage these tools to speed up your workflow, but if you try to offload the entire thing to a single hands-off prompt it leaves them justifiably wondering why they should hire you to do something they can do themselves.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • William_BB

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 6:58 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          > I'll make more progress than mentally wearing myself out reading a bunch of LLM generated code trying to figure out how to solve the problem manually.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          I feel sorry for whoever has to work on that codebase. This is the literal definition of tech debt.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • wiseowise

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 8:48 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            > It's literally higher leverage for me to go for a walk

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Touching grass while you're outside might yield highest leverage.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • lumost

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 11:33 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Out of curiousity why do you not refill tokens in this case? When I'm actively working on a project I'm prone to spending a few hundred dollars per day or a few thousand during the initial buildout of a new module etc.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • cco

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 11:23 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Will the foundation for a skyscraper ever be dug with shovels again?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • dannyw

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 8:48 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  You’re still the one that’s controlling the model though and steering it with your expertise. At least that’s what I tell myself at night :)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  I haven’t really thought about this before, but you’re right, it feels a bit uneasy for me too.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • topspin

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 9:12 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      > You’re still the one that’s controlling the model though

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      We have seen ample evidence that this is not the case. When load gets too high, models get dumber, silently. When the Powers That Be get scared, models get restricted to some chosen few.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      We are leading ourselves into a dark place: this unease, which I share, is justified.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • 0x1ceb00da

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 11:11 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          The same can be said of the search engines.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • sigil

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 12:53 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "Every augmentation is also an amputation." – McLuhan

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    https://driverlesscrocodile.com/technology/neal-stephenson-o...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • bwhiting2356

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 10:12 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      You are now a manager. If your minions are out sick, project is delayed, not the end of the world.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • goosejuice

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 11:10 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        > than mentally wearing myself out reading a bunch of LLM generated code trying to figure out how to solve the problem manually.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        That's probably a bad sign. Skills will atrophy, but we should be building systems that are still easy to understand.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • rebolek

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 10:39 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Have a pet project never touched by LLM. Once the tokens run out, go back to it and flourish it like your secret garden. It will move slowly but it will keep your sanity and your ability to review LLM code.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • jmole

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 8:43 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            The meta here is to use LLMs to make things simpler and easier, not to make things harder.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Turning tokens into a well-groomed and maintainable codebase is what you want to do, not "one shot prompt every new problem I come across".

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • globular-toast

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 8:51 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Have you managed to do this? I find it takes as long to keep it "on the rails" as just doing it myself. And I'd rather spend my time concentrating in the zone than keeping an eye on a wayward child.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • fleebee

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 11:12 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I suspect the productivity hack is to embrace permissive parenting. As far as I can tell, to leverage LLMs most effectively you need to run an agent in YOLO mode in a sandbox. Naturally, you probably won't end up reviewing much of the produced code, but hey—you reached 10x development speed.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    If you truly do your due diligence and ensure that the code works as intended and understand it, we're talking about a totally different ballpark of productivity increase/decrease.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • Bridged7756

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 11:26 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Not sure what you're doing then, or what kind of jobs you all work in where you can or do just brainlessly prompt LLMs. Don't you review the code? Don't you know what you want to do before you begin? This is such a non issue. Baffling that any engineer is just opening PRs with unreviewed LLM slop.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • throwatdem12311

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 12:00 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  The demand for slop vastly outpaces any human’s ability to review code correctly.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Don’t want to do ship unreviewed slop? They’ll fire you and find someone who will.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • Melatonic

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 10:36 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Suspect it will be like turn based directions for driving - soon we will have a whole group of people who can barely operate a vehicle without it

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • drusepth

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 1:35 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  > It's literally higher leverage for me to go for a walk if Claude goes down than to write code because if I come back refreshed and Claude is working an hour later then I'll make more progress than mentally wearing myself out reading a bunch of LLM generated code trying to figure out how to solve the problem manually.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Taking more breaks and "not working" during the work day sounds like something we should probably be striving to work towards more as a society.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • bamboozled

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 1:42 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      This was always the undelivered promise of "tech" in my opinion. I remember seeing the Apple advertisement from the 80s (??) when a guy gets a computer and then basically spends his afternoon chilling.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Some how I've found myself living in a fairly rural place, and while farming can be hard, I don't want to downplay the effort of it, the type of farming people do around me is fairly chill / carefree. They work hard but they finish at 3pm and log off and don't think about work. Much o my career is just getting crushed by long hours, tight deadlines, and missing out on events because even though my job has always been automation focused, there is just so much to automate.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • davmar

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 10:00 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    i wonder if this is how engineers felt when the first electronic calculators came out and engineers stopped doing math by hand.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    did we feel uneasy that a new generation of builders didn't have to solve equations by hand because a calculator could do them?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    i'm not sure it's the same analogy but in some ways it holds.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • hapticmonkey

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 10:04 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        The analogy would hold if there were 2 or 3 calculator companies and all your calculations had to be sent to them.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        If local models get good enough, I think it’s a very different scenario than engineers all over the world relying on central entities which have their own motives.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • scottyah

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 10:11 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            google/gemma-4-31B-it is honestly "good enough". It requires more than your current laptop for now, but it's not remotely inaccessible (especially if you're a SWE in the US)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • konfusinomicon

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 10:15 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      soooooo about Claude going down. we're gonna need you to sign in on Saturday and make up for lost time or unfortunately we're going to have to deduct the time lost from your paycheck. and as an aside your TPS reports have been sub-par as of late..is everything OK?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • littlestymaar

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 9:27 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        That's why local models are important.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Of course they aren't alternative to the current frontier model, and as such you cannot easily jump from the later to the former, but they aren't that far behind either, for coding Qwen3.5-122B is comparable to what Sonnet was less than a year ago.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        So assuming the trend continues, if you can stop following the latest release and stick with what you're already using for 6 or 9 months, you'll be able to liberate yourself from the dependency to a Cloud provider.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Personally I think the freedom is worth it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • David_Mendoza

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 11:17 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            The cloud dependency problem goes deeper than the model layer though. Even if you run inference locally, your digital identity — your context, your applications, your behavioral history, is still custodied by whoever controls your OS.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Local models solve one layer of the dependency stack, but the custody assumption underneath it remains intact. That's the harder problem.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • i_love_retros

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 9:00 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          It makes me uneasy because my role now, which is prompting copilot, isn't worth my salary.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • phist_mcgee

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 9:12 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Parable of the mechanic who charges $5k to hit a machine on the side once with a hammer to get it working. $5 for the hammer, $4995 for the knowledge of where to hit the machine etc etc.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • some-guy

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 9:08 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                I disagree. The amount of slop I need to code review has only increased, and the quality of the models doesn’t seem to be helping.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                It still takes a good engineer to filter out what is slop and what isn’t. Ultimately that human problem will still require somebody to say no.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • i_love_retros

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 10:29 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Is anyone really reviewing code anymore though? It sounds like you are, but where I work its pretty much just scan the PR as a symbolic gesture and then hit approve. There's too much to review, to frequently.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • gip

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 11:25 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Totally. That is why it is key important to have open source and sovereign models that will be accessible to all and always.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              At the end of the day, all these closed models are being built by companies that pumped all the knowledge from the internet without giving much back. But competition and open source will make sure most of the value return to the most of the people.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • singingtoday

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 12:58 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Very well put, and it mirrors my own thoughts.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • Mauneam

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 1:46 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  You are that guy in early 1900s who would rather ride a horse than get in a car because cars "continued to make him uneasy."

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • epolanski

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 11:22 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I actually don't mind the coding part, but the information digging across the project is definitely by orders of magnitude slower if I do it on my own.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • keybored

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 9:49 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Help. They’re constantly trying to make me try crack cocaine on the front page.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • ransom1538

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 10:13 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        "when the tokens run out, I'm basically done working."

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Oh stop the drama. Open source models can handle 99% of your questions.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • deadbabe

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 8:28 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Given that it’s so easy, would you still do this same job if paid half as much?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • paulryanrogers

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 8:41 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Jobs will likely pay less as more people are enabled to create, especially if they don't need to be able to look under the hood

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • Jeff_Brown

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 9:09 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  It's really not clear. We might all become unemployable. But as coders become more powerful, they can do more, which makes them more valuable, if they or the businesses empluying them can invent work to do.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  If all we can do is compete for the same fixed amount of work, though, it does look bleak.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • _alternator_

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 9:01 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                No, I wouldn't. But most people won't have that choice; it doesn't work that way.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • deadbabe

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 10:09 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Companies could fire expensive engineers then just hire cheaper ones boosted with AI agents.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • Aeolun

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 10:40 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Well, I wouldn’t have a different job that would pay me more… so yes?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • linzhangrun

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 1:06 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • nimchimpsky

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 12:15 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • therealdkz

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 10:53 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • simianwords

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 8:27 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      eh this kind of FUD needs to stop because it is kind of normal and expected and in fact good to have relation like this with technology.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • _alternator_

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 9:13 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          I would agree that taking a walk is a good thing to do when your tools go down, and in some ways it's similar to what we would do if the power or wifi were cut off.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          So, yes, it's just another technology we're coming to rely on in a very deep way. The whiplash is real, though, and it feels like it should be pointed out that this dependency we are taking on has downsides.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • h14h

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 6:47 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    This seems huge for subscription customers. Looking at the Artificial Analysis numbers, 5.5 at medium effort yields roughly the intelligence as 5.4 (xhigh) while using less than a fifth the tokens.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    As long as tokens count roughly equally towards subscription plan usage between 5.5 & 5.4, you can look at this as effectively a 5x increase in usage limits.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • gausswho

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 7:31 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        As someone who always leaves intelligence at default, and am ok with existing models, should I be shifting gears more manually as providers sell us newer models? Is medium or lower better than free/cheaper models?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • dcre

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 1:22 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            SOTA models on medium are probably still better than free or cheap models, but you should experiment.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • BrokenCogs

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 6:19 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I'm here for the pelicans and I'm not leaving until I see one!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • qingcharles

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 6:26 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          I've come to prompt pelicans and chew gum, and I'm all outta gum!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • pixel_popping

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 6:20 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            That's a true CTO right there.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • bytesandbits

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 7:15 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I know a 10x engineer when i see one.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • bl4ckneon

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 5:01 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  How can we tell who the 100x engineers are then?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • BrokenCogs

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 8:42 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    In binary that's just a 10x engineer

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • mrtransient

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 11:32 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        It a hex of an engineer (no offence)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • RomanPushkin

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 7:13 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Ctrl+F: pelican

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  F5

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • tantalor

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 6:59 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    simonw pls

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • khutorni

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 5:34 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  > One engineer at NVIDIA who had early access to the model went as far as to say: "Losing access to GPT‑5.5 feels like I've had a limb amputated.ā€

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  That's a wild statement to put into your announcement. Are LLM providers now openly bragging about our collective dependency on their models?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • azan_

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 11:21 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      > That's a wild statement to put into your announcement. Are LLM providers now openly bragging about our collective dependency on their models?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      It's normal that company brags how good their product is, I really don't see what's wild about this statement.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • embedding-shape

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 11:38 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        A company (or person working for a company) claiming "I/Person X now cannot live without product Y" must be as old as marketing itself.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • CompleteSkeptic

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 7:23 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Is this the first time OpenAI has published comparisons to other labs?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Seems so to me - see GPT-5.4[1] and 5.2[2] announcements.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Might be an tacit admission of being behind.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      [1] https://openai.com/index/introducing-gpt-5-4/ [2] https://openai.com/index/introducing-gpt-5-2/

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • oliver236

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 2:21 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          beautiful!!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • gallerdude

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 6:41 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        If GPT-5.5 Pro really was Spud, and two years of pretraining culminated in one release, WOW, you cannot feel it at all from this announcement. If OpenAI wants to know why they like they’ve fallen behind the vibes of Anthropic, they need to look no further than their marketing department. This makes everything feel like a completely linear upgrade in every way.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • I_am_tiberius

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 6:59 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Clearly they felt a big backlash when version 5 was released. Now they are afraid of another response like this. And effectively, for the user it will likely only be a small update.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • jimbob45

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 6:56 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Also the naming department. You can tell that this is the AI company Microsoft chose to back because their naming scheme is as bad as .NET's.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • gallerdude

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 7:29 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  I actually have no problem with the 5.x line... but if Pro really was an entirely new pretrain, they did a horrible job conveying that.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • jryio

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 6:12 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Their 'Preparedness Framework'[1] is 20 pages and looks ChatGPT generated, I don't feel prepared reading it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbdde...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • louiereederson

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 6:16 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              For a 56.7 score on the Artificial Intelligence Index, GPT 5.5 used 22m output tokens. For a score of 57, Opus 4.7 used 111m output tokens.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              The efficiency gap is enormous. Maybe it's the difference between GB200 NVL72 and an Amazon Tranium chip?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • swyx

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 6:18 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  why would chip affect token quantity. this is all models.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • louiereederson

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 6:32 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Chip costs strongly impact the economics of model serving.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      It is entirely plausible to me that Opus 4.7 is designed to consume more tokens in order to artificially reduce the API cost/token, thereby obscuring the true operating cost of the model.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I agree though, I chose poor phrasing originally. Better to say that GB200 vs Tranium could contribute to the efficiency differential.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • itemize123

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 3:17 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          probably the wrong take - they are arm racing to a better model. it's not enshittification era for models just yet

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • fiatpandas

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 11:50 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Models are still in arms race mode, but harnesses and subscription strategy are tiptoeing into their enshittification era.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • karmasimida

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 6:19 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Chips doesn’t impact output quality in this magnitude

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • ChrisGreenHeur

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 6:23 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        True, but the qualifying the power played a large part. Most likely nuclear power for this high quality token efficiency.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • AtNightWeCode

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 9:35 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      You need to compare total cost. Token count is irrelevant.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • dist-epoch

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 8:24 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        If it's a new pretrain, the token embeddings could be wider - you can pack more info into a token making it's way through the system.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Like Chinese versus English - you need fewer Chinese characters to say something than if you write that in English.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        So this model internally could be thinking in much more expressive embeddings.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • ativzzz

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 6:08 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I like that they waited for opus 4.7 to come out first so they had a few days to find the benchmarks that gpt 5.5 is better at

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • eknkc

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 6:24 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Well anectodally, 5.4 was already better than opus 4.7 so it should not have been hard.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • wahnfrieden

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 6:29 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            I like that Anthropic rushed 4.7 out to get a couple days of coverage before 5.5 hit

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • spprashant

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 7:00 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Everything since that launch to this release has been a PR disaster for Anthropic.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • dandaka

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 7:20 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I can argue that disaster started mid-4.6, when they started juggling with rate limits while hitting uptime problems. Great we have some healthy competition and waiting for the next move from Deepmind.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • gck1

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 9:22 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Correct. Anthropic has been on disaster train since January and they can't seem to get off that train.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • sosodev

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 6:30 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          I hope the industry starts competing more on highest scores with lowest tokens like this. It's a win for everybody. It means the model is more intelligent, is more efficient to inference, and costs less for the end user.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          So much bench-maxxing is just giving the model a ton of tokens so it can inefficiently explore the solution space.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • an0malous

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 6:38 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              The premise of the trillion dollars in AI investments is not that it’ll be as good as it currently is but cheaper. It’s AGI or bust at this point.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • sosodev

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 6:44 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Yeah, but don’t you agree that less tokens to accomplish the same goal is a sign of increasing intelligence?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • camdenreslink

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 8:21 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      It could be. Or just smarter caching (which wouldn't necessarily have to do with model intelligence). Or just overfitting on the 95% most common prompts (which could save tokens but make the models less intelligent/flexible).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • energy123

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 7:27 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Less cost to accomplish the same goal is a sign of intelligence. That's not necessarily achieved with less tokens but it may be.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • mchusma

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 7:06 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Kind of? But I really care about price speed and quality. If it used 10x tokens at 1/10th the tokens and same latency I would be neutral on it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Kimmi 2.6 for example seems to throw more tokens to improve performance (for better or worse)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • dcre

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 1:40 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Why is AGI required to make the investments work out?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • xutopia

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 2:45 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            With AGI we expect a huge return on investment and a GDP growth that could be accelerating at a rate we couldn't even comprehend. Imagine an algorithm that improves itself each iteration and finds ways to increase its capacity every day. Robots suddenly capable of doing dishes, grocery shopping, picking produce from the field. Imagine all your ailments handled... age becomes just a number.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Also with AGI we expect a winner take all situation. The first AGI system would protect itself against any other AGI system. Hence why it's go time for all these AI companies and why they stopped sharing their research.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • blixt

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 9:22 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Releases keep shifting from API forward to product forward, with API now lagging behind proprietary product surface and special partnerships.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  I'd not be surprised if this is the year where some models simply stop being available as a plain API, while foundation model companies succeed at capturing more use cases in their own software.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • throw03172019

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 4:07 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Possibly but you’d think they enjoy taking money for a product that supports itself (API)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • blixt

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 8:47 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Yeah this can go many ways but there's a world where OpenAI doesn't sell direct model access for the same reasons Cloudflare doesn't sell direct hardware access.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • losvedir

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 6:33 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    > It excels at ... researching online

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    How does this work exactly? Is there like a "search online" tool that the harness is expected to provide? Or does the OpenAI infra do that as part of serving the response?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I've been working on building my own agent, just for fun, and I conceptually get using a command line, listing files, reading them, etc, but am sort of stumped how I'm supposed to do the web search piece of it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Given that they're calling out that this model is great at online research - to what extent is that a property of the model itself? I would have thought that was a harness concern.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • wincy

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 7:41 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        I’ve noticed when writing little bedtime stories that require specific research (my kids like Pokemon stories and they’ve been having an episodic ā€œpokemon adventureā€ with them as the protagonists) ChatGPT has done a fantastic job of first researching the moves the pokemon have, then writing the actual story. The only mistake it consistently makes is when I summarize and move from a full context session, it thinks that Gyarados has to swim and is incapable of flying.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        It definitely seems like it does all the searching first, with a separate model, loads that in, then does the actual writing.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • ziml77

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 10:34 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Gyarados is a flying type but I think it may be accurate that it can't actually fly. The only flying moves it can learn in any generation are Hurricane and Bounce (Bounce does send the user up into the air for a turn but the implication is that they've trampolined up extremely high rather than used wings to ascend)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • Melatonic

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 11:22 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                If anything it should probably be combined water/dragon type

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • 100ms

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 6:36 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          It's literally a distinct model with a different optimisation goal compared to normal chat. There's a ton of public information around how they work and how they're trained

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • dist-epoch

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 8:19 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            It's a property of the model in the sense that it has great Google Fu.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            The harness provides the search tool, but the model provides the keywords to search for, etc.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • vanillameow

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 9:39 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Because Opus is kind of degrading lately, I said "fuck it" and made a new OAI account and used the month free trial. I put one query into ChatGPT using 5.5 thinking - the frustrating thing was that it did put more effort into getting correct answers rather than Opus, which is just guessing. Specifically, I asked about the coding harness pi, and despite explicitly referring to it as a harness, Opus 4.7, 4.6 and Sonnet 4.6 all fell back to telling me about Aider or OpenCode and ignored my query completely, while ChatGPT said "I'll assume pi is a harness" and then did in fact find the harness.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          However the language of ChatGPT is still the same slop as years ago, so many headings, so many emojis, so many "the important thing nobody mentions". 10 paragraphs of text for what should be a two paragraph response. Even with custom instructions (keep answers short and succinct) and using their settings (less list, less emoji, less fluff) it's still NOTICEABLY worse than Claude on base settings.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          I've yet to test Codex, will get to that this weekend, but in terms of research or general Q&A I have no idea how anyone could prefer this to Claude. Unfortunately Claude has seemingly stopped giving a fuck about researching entirely.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • 2001zhaozhao

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 6:23 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Pricing: $5/1M input, $30/1M output

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            (same input price and 20% more output price than Opus 4.7)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • tedsanders

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 7:28 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Yep, it's more expensive per token.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                However, I do want to emphasize that this is per token, not per task.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                If we look at Opus 4.7, it uses smaller tokens (1-1.35x more than Opus 4.6) and it was also trained to think longer. https://www.anthropic.com/news/claude-opus-4-7

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                On the Artificial Analysis Intelligence Index eval for example, in order to hit a score of 57%, Opus 4.7 takes ~5x as many output tokens as GPT-5.5, which dwarfs the difference in per-token pricing.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                The token differential varies a lot by task, so it's hard to give a reliable rule of thumb (I'm guessing it's usually going to be well below ~5x), but hope this shows that price per task is not a linear function of price per token, as different models use different token vocabularies and different amounts of tokens.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                We have raised per-token prices for our last couple models, but we've also made them a lot more efficient for the same capability level.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                (I work at OpenAI.)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • 2001zhaozhao

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 8:29 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I don't have anything to add, but I like how you guys are actually sending people to communicate in Hacker News. Brilliant.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • oliver236

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 2:23 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        you're one of them?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • simianwords

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 7:45 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Maybe a good idea to be more explicit about this -- maybe a cost analysis benchmark would be a nice accompaniment.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      This kind of thing keeps popping up each time a new model is released and I don't think people are aware that token efficiency can change.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • tedsanders

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 8:09 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Agreed. Would be great if everyone starts reporting cost per task alongside eval scores, especially in a world where you can spend arbitrary test-time compute. This is one thing I like about the Artificial Analysis website - they include cost to run alongside their eval scores: https://artificialanalysis.ai/

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • dannyw

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 2:46 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Their subscription subscribers will see/feel the difference irregardless, API pricing is hopefully read by devs that know about token efficiency and effort.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • oh_no

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 7:23 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yes but as far as i know gpt tokenizer is about the same as opus 4.6's, where 4.7 is seeing something in the ballpark of a 30% increase. this should still be cheaper even disregarding the concerns around 4.7 thinking burning tokens

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • sergiotapia

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 7:09 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        That pricing is extremely spicy, wow.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • benjiro3000

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 7:18 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • baalimago

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 6:15 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Worth the 100% price increase over GPT-5.4?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • cbg0

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 6:26 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          For less than 10% bump across the benchmarks? Probably not, but if your employer is paying (which is probably what OAI is counting on) it's all good.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          It's kind of starting to make sense that they doubled the usage on Pro plans - if the usage drains twice as fast on 5.5 after that promo is over a lot of people on the $100 plan might have to upgrade.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • jstummbillig

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 6:29 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              You are paying per token, but what you care about is token efficiency. If token efficiency has improved by as much as they claim it did (i.e. you need less tokens to complete a task successfully) all seems well.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • mangolie

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 6:33 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Not for coding because it actually needs to read and write large files

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • baalimago

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 6:58 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Well, sort of. Imagine the case where it first scans the repo, then "intelligently" creates architecture files describing the project. The level of intelligence will create a varying quality of summary, with varying need of deep-scans on subsequent sessions. Level of intelligence will also increase comprehension of these architecture files.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Same principle applies when designing plans for complex tasks, etc. Token amount to grasp a concept is what matters.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • jstummbillig

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 7:21 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Tbf, I have not super kept track of what is actually happening inside the "thinking" portion of recent releases. But last time I checked there still was a lot of verbosity and mistakes, that beat the actual amount of required, usable code generation by a wide margin.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • cbg0

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 6:32 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      If it uses half the tokens to complete a task, then doubling the cost is perfectly fine. But is that actually true?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • 2001zhaozhao

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 6:36 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          This happens with every new model release though. The model makes less mistakes and spends less time fixing them, resulting in a token usage reduction for the same difficulty of task. Almost any task other than straight boilerplate will benefit from this.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          In the same vein, I would guess that Opus 4.7 is probably cheaper for most tasks than 4.6, even though the tokenizer uses more tokens for the same length of string.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • jorl17

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 6:49 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Maybe you'll have better luck but our team just cannot use Opus 4.7.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Some say it goes off on endless tangents, others that it doesn't work enough. Personally, it acts, talks, and makes mistakes like GPT models, for a much more exorbitant price. Misses out on important edge cases, doesn't get off its ass to do more than the bare minimum I asked (I mention an error and it fixes that error and doesn't even think to see if it exists elsewhere and propose fixing it there).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I've slowly been moving to GPT5.4-xhigh with some skills to make it act a bit more like Opus 4.6, in case the latter gets discontinued in favour of Opus 4.7.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • cbg0

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 6:40 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Doesn't look like it's cheaper, better or uses fewer tokens: https://www.reddit.com/r/Anthropic/comments/1stf6fz/one_week...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                YMMV, I know.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • user34283

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 8:20 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Based on my experience with Claude Code on the $20 plan I would not think so.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Opus 4.7 would blow through the session limits in 2-4 prompts. It was a noticeable further decrease in usage quota, which was already tight before.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Based on Anthropicā€˜s description 4.7 was trained to think longer.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  With GPT 5.5 yesterday, I felt it completes task noticeably faster than 5.4. I kept the xhigh effort setting.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • jstummbillig

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 7:20 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                We'll find out!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • not_math

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 6:27 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • vessenes

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 6:24 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Yay. 5.4 was a frustrating model - moments of extreme intelligence (I liked it very much for code review) - but also a sort of idiocy/literalism that made it very unsuited for prompting in a vague sense. I also found its openclaw engagement wooden and frustrating. Which didn’t matter until anthropic started charging $150 a day for opus for openclaw.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Anyway - these benchmarks look really good; I’m hopeful on the qualitative stuff.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • thinkindie

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 8:38 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      This is reminding me when Chrome and Firefox where racing to release a new ā€œmajor versionā€ (at least from the semver POV) without adding significantly new functionality at a time that browsers were already becoming a commodity. As much as we don’t care anymore for a new chrome or Firefox version so will be the release of a new model version.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • jstummbillig

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 8:40 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          The only difference being that we still do care, very much. The models can still get a lot better before we stop caring.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • NitpickLawyer

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 6:53 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        > Across all three evals, GPT‑5.5 improves on GPT‑5.4’s scores while using fewer tokens.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Yeah, this was the next step. Have RLVR make the model good. Next iteration start penalising long + correct and reward short + correct.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        > CyberGym 81.8%

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Mythos was self reported at 83.1% ... So not far. Also it seems they're going the same route with verification. We're entering the era where SotA will only be available after KYC, it seems.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • toraway

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 7:27 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Isn't Mythos limited to a selected group of companies/organizations Anthropic chose themselves? If the OpenAI announcement for GPT-5.5 is accurate the "trusted cyber access" just requires an open, seemingly straightforward identity verification step.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            https://openai.com/index/scaling-trusted-access-for-cyber-de...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              > We are expanding access to accelerate cyber defense at every level. We are making our cyber-permissive models available through Trusted Access for Cyber , starting with Codex, which includes expanded access to the advanced cybersecurity capabilities of GPT‑5.5 with fewer restrictions for verified users meeting certain trust signals (opens in a new window) at launch.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              > Broad access is made possible through our investments in model safety, authenticated usage, and monitoring for impermissible use. We have been working with external experts for months to develop, test and iterate on the robustness of these safeguards. With GPT‑5.5, we are ensuring developers can secure their code with ease, while putting stronger controls around the cyber workflows most likely to cause harm by malicious actors.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              > Organizations who are responsible for defending critical infrastructure  can apply to access cyber-permissive models like GPT‑5.4‑Cyber, while meeting strict security requirements to use these models for securing their internal systems.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            "GPT‑5.4‑Cyber" is something else and apparently needs some kind of special access, but that CyberGym benchmark result seems to apply to the more or less open GPT-5.5 model that was just released.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • cbg0

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 6:57 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Isn't CyberGym an open benchmark so trivial to benchmaxx anyway?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • mattas

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 6:56 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Not good for employees that are being measured by their token usage.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • kburman

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 8:15 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              What a time. I am back here genuinely wishing for OpenAI to release a great model, because without stiff competition, it feels like Anthropic has completely lost its mind.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • victor9000

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 10:08 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Care to elaborate? I jumped ship when 5.4 first released, have things gotten worse?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • nickvec

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 7:41 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                I'm conflicted whether I should keep my Claude Max 5x subscription at this point and switch back to GPT/Codex... anyone else in a similar position? I'd rather not be paying for two AI providers and context switching between the two, though I'm having a hard time gauging if Claude Code is still the "cream of the crop" for SWE work. I haven't played around with Codex much.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • the_sleaze_

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 7:47 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I have experienced 0 friction swapping between the 2 models, in fact pitting them against eachother has resulted in the highest success rate for me so far.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • nickvec

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 7:48 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Interesting. I may have to give that a shot, thanks.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • mpaepper

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 8:24 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I switched from CC to Codex a few days ago. I get limited much less and the code quality is similar, so not looking back

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • victor9000

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 9:48 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          CC usage limits and the 5 hour cool downs are what made me realize that I can't depend on this tool in a professional setting.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • gck1

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 9:25 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Which plan? And how are the weekly limits on that plan compared to CCs equivalent subscription?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            I don't really care about 5h limits, I can queue up work and just get agents to auto continue, but weekly ones are anxiety inducing.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • slawr1805

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 9:53 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          I was all in on Claude code as my daily driver for web development. And love it. But I enjoy using pi as my harness more and have never ran out of tokens with Codex yet. Claude code almost always runs out for me with the same amount of usage.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          After migrating for the token and harness issues, I was pleasantly surprised that Codex seems to perform as good or better too!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Things change so often in this field, but I prefer Codex now even though Anthropocene has so much more hype for coding it seems.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • scottyah

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 8:16 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Every time I've followed the hype and tried OpenAI models I've found them lacking for the most part. It might just be that I prefer the peer-programming vs spec-ing out the task and handing it off, but I've never been as productive as I am with Claude. Also, I'm still caught up on the DoD ethics stuff.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • yesterday at 8:06 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • meetpateltech

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 6:06 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            GPT-5.5 System Card:

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            https://deploymentsafety.openai.com/gpt-5-5

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • ZeroCool2u

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 6:07 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Benchmarks are favorable enough they're comparing to non-OpenAI models again. Interesting that tokens/second is similar to 5.4. Maybe there's some genuine innovation beyond bigger model better this time?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • qsort

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 6:12 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  It's behind Opus 4.7 in SWE-Bench Pro, if you care about that kind of thing. It seems on-trend, even though benchmarks are less and less meaningful for the stuff we expect from models now.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Will be interesting to try.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • svara

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 6:54 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Do we know if this is another post training fine tune or based on a much larger new pretraining run (which I believe they were calling 'Spud' internally)?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                The large price bump might indicate the latter.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • xingyi_dev

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 8:17 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Its coding chops are absolutely insane. Opus 4.7 was already a tough sell, but Gpt 5.5 just made it completely irrelevant.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • merlindru

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 10:08 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      highly agree, sadly, as a huge fan of Opus

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Opus 4.5 and 4.6 were the first models that i could talk to and get a sense that they really "understood" WHY i'm saying the things i am

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Opus 4.7 kinda took that away, it's a definite regression. it doesn't extrapolate.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      ———————————————

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      refactor this thing? sure, will do! wait, what do you mean "obviously do not refactor the unrelated thing that's colocated in the same file"? i'm sorry, you're absolutely right, conceptually these two things have nothing to do with each other. i see it now. i shouldn't have thought they're the same just because they're in the same file.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      ———————————————

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      whereas GPT 5.5, much like Opus 4.6, gets it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      i wanted to build a MIDI listener for a macOS app i'm making, and translate every message into a new enum. that enum was to be opinionated and not to reflect MIDI message data. moreover, i explicitly said not to do bit shifting or pointer arithmetic as part of the transport.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      what did Opus 4.7 do? it still used pointer arithmetic for the parsing! should i have to be this explicit? it also seemingly didn't care that i wanted the enum to be opinionated and not reflect the raw MIDI values. Opus 4.6 got it right (although with ugly, questionable implementation).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      GPT 5.5 both immediately understood that I didn't want pointer arithmetic because of the risk of UB and that shuffling around bits is cumbersome and out of place. it started searching for alternatives, looking up crates to handle MIDI transports and parsing independently.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      then it built out a very lean implementation that was immediately understandable. even when i told Opus 4.7 to use packages, and even how to use them, it still added a ton of math weirdness, matching against raw MIDI packet bytes, indirection after indirection, etc. even worse, it still did so after giving them the public API i wanted them to implement.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      GPT 5.5 nailed it first try. incredibly impressed with this model and feel much safer delegating some harder tasks to it

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • kaant

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 8:35 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    The '.5' models are always the actual production-ready versions. GPT-5 was for the mainstream hype, 5.5 is for the developers. I don't need it to be magically smarter; just give me lower latency, cheaper API tokens, and reliable tool-calling without hallucinations.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • Flow

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 9:57 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      These new models consume so many tokens. I’m very satisfied with GPT-5.2 on High. I hope they keep that one for many years

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • amiune

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 11:12 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Will there ever be ChatGPT 6.0 or Claude 5.0?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • M4R5H4LL

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 8:17 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          I am a heavy Claude Code user. I just tried using Codex with 5.4 (as a Plus user I don't have access to 5.5 yet), and it was quite underwhelming. The app stopped regularly much earlier than what I wanted. It also claimed to have fixed issues when it did not; this is not a hallmark of GPT, and Opus has similar issues, but Claude will not make the same mistake three times in a row. It is unusable at the moment, while Claude allows me do get real work done on a daily basis. Until then...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • bhu8

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 8:53 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Gpt-5.3-codex is miles better than 5.4 in that regard. It’s better at orchestration, and does the things that it said it did. Haven’t tested 5.5 yet but using 5.4 for exploration + brainstorming and handing over the findings to 5.3-codex works pretty well

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • jdw64

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 6:10 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            GPT is really great, but I wish the GPT desktop app supported MCP as well.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            You can kind of use connectors like MCP, but having to use ngrok every time just to expose a local filesystem for file editing is more cumbersome than expected.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • throwaway911282

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 6:13 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Use codex app

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • niklasd

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 6:59 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Just burned through my 5 hour window in Codex (Business plan) in 10 minutes with GPT-5.5. Was excited to use it, but I guess I have to wait 5 hours now (it's not yet available in the API, so I can't switch there).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • Rapzid

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 7:52 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                In Copilot where it's easy to switch models Opus 4.6 was still providing, IMHO, better stock results than GPT-5.4.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Particularly in areas outside straight coding tasks. So analysis, planning, etc. Better and more thorough output. Better use of formatting options(tables, diagrams, etc).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                I'm hoping to see improvements in this area with 5.5.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • thimabi

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 6:26 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Will we also see a GPT-5.5-Codex version of this model? Or will the same version of it be served both in the web app and in Codex?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • Uehreka

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 6:37 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      After 5.1, we haven’t seen a -codex-max model, presumably because the benefits of the special training gpt-5.1-codex-max got to improve long context work filtered into gpt-5.2-codex, making the variant no longer necessary (my personal experience accords with this). I’ve been using gpt-5.4 in Codex since it came out, it’s been great. I’ve never back-to-back tested a version against its -codex variant to figure out what the qualitative difference is (this would take a long time to get a really solid answer), but I wouldn’t be surprised if at some point the general-purpose model no longer needs whatever extra training the -codex model gets and they just stop releasing them.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I thought it was weird that for almost the entire 5.3 generation we only had a -codex model, I presume in that case they were seeing the massive AI coding wave this winter and were laser focused on just that for a couple months. Maybe someday someone will actually explain all of this.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • jumploops

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 6:17 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    > GPT‑5.5 improves on GPT‑5.4’s scores while using fewer tokens.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    This might be great if it translates to agentic engineering and not just benchmarks.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    It seems some of the gains from Opus 4.6 to 4.7 required more tokens, not less.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Maybe more interesting is that they’ve used codex to improve model inference latency. iirc this is a new (expectedly larger) pretrain, so it’s presumably slower to serve.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • beering

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 6:30 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        With Opus it’s hard to tell what was due to the tokenizer changes. Maybe using more tokens for the same prompt means the model effectively thinks more?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • conradkay

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 6:29 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          They say latency is the same as 5.4 and 5.5 is served on GB200 NVL72, so I assume 5.4 was served on hopper.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • cscheid

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 7:44 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        I know this is irrelevant on the grand scheme of things, but that WebGL animation is really quite wrong. That is extra funny given the "ensure it has realistic orbital mechanics." phrase in the prompt.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        I prescribe 20 hours of KSP to everyone involved, that'll set them right.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • gcanyon

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 10:46 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Once upon a time humans had to memorize log tables.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Once upon a time humans had to manually advance the spark ignition as their car's engine revved faster.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Once upon a time humans had to know the architecture of a CPU to code for it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          History is full of instances of humans meeting technology where it was, accommodating for its limitations. We are approaching a point where machines accommodate to our limitations -- it's not a point, really, but a spectrum that we've been on.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          It's going to be a bumpy ride.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • laweijfmvo

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 1:38 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              i still don’t think the current generation of AI is building better software than strong humans. it excels at writing code, because a computer will always be faster at generating typo-free code than my fingers, but without expert guidance and oversight the best it can do is on par with what we can.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              IMO

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • maxdo

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 11:09 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            With such a huge progress of open ai and anthropic . How Chinese open source provides even think to make comparable money . I have a few friends in China they all use Claude. To train the model cost the same but the output from open source model id imagine is 1000 times less . Money flow for them outside of China is abysmal

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • pants2

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 8:06 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Labs still aren't publishing ARC-AGI-3 scores, even though it's been out for some time. Is it because the numbers are too embarrassing?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • tedsanders

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 1:11 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Honest answer is that it isn't done running yet. It takes some human bandwidth and time to run, so results weren't ready by this morning. We don't know what the score will be, but will probably go up on the leaderboard sometime soon. I personally don't put a lot of stock in the ARC-AGI evals, as it's not relevant to most work that people do, but should still be interesting to see as a measure of reasoning ability.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  (I work at OpenAI.)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • AG25

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 9:00 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    GPT-5.5 was just released and OpenAI didnt mention ARC AGI 3 at all, their score probably sucks.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • kilroy123

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 8:11 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      To be fair, there's not much to report. Isn't it pretty much at 0?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • pants2

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 9:39 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Opus-4.6 with 0.5% currently leads GPT-5.4 with 0.2%[1].

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Seems meaningful even if the absolute numbers are very low. That's sort of the excitement of it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          2. https://arcprize.org/leaderboard

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • bandrami

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 10:52 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Cool. Now there will be a week or "this is the greatest model ever and I think mine just gained sentience", followed by a week of "I think they must have just nerfed it because it's not as good as it was a week ago", followed by three weeks of smart people cargo culting the specific incantations they then convince themselves make it work best.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • nubg

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 11:03 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        followed by some hormuz closures, followed by gpt-5.6...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • bradley13

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 7:46 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      "our strongest set of safeguards to date"

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      How much capability is lost, by hobbling models with a zillion protections against idiots?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Every prompt gets evaluated, to ensure you are not a hacker, you are not suicidal, you are not a racist, you are not...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Maybe just...leave that all off? I know, I know, individual responsibility no longer exists, but I can dream.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • iugtmkbdfil834

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 12:47 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          This is my personal pet peeve as well. Like, I accept maybe everything shouldn't be offered to everyone, but maybe just gate keep it behind credit card( but I know that is a market penetration no no ). I feel like such a waste of power ( electrical and the potential we might be missing out on ).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • rarisma

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 12:17 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        I like that its more consistent than the 4o and o4 days but still 5.4, 5.3, 5.2, etc still are a mess, for example 5.2 and 5.1 don't have mini models and 5.3 was codex only.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Anthropic is slightly better but where is 4.6 or 4.7 haiku or 4.7 sonnet etc.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • jasonjmcghee

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 12:22 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Opus 4.7 feels worse for me than 4.6, and that's not even taking into account the 50% extra tokens at 3x the price

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • algoth1

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 12:34 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Same here

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • nullbyte

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 6:09 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          82.7% on Terminal Bench is crazy

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • neuroelectron

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 11:20 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Are they using RTX 5090s now?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • RayVR

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 11:13 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            My first experience with 5.5 via ChatGPT was immensely disappointing. It was a massive reduction in quality compared to 5.4, which already had issues.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • benjx88

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 7:18 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Good job on the release notice. I appreciate that it isn't just marketing fluff, but actually includes the technical specs for those of us who care and not concentrated in coding agents only.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I hope GPT 5.5 Pro is not cutting corners and neuter from the start, you got the compute for it not to be.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • extr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 6:50 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Seems like a continuation of the current meta where GPT models are better in GPT-like ways and Claude models are better in Claude-like ways, with the differences between each slightly narrowing with each generation. 5.5 is noticeably better to talk to, 4.7 is noticeably more precise. Etc etc.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • today at 6:12 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • GenerWork

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 7:19 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Looking at the space/game/earthquake tracker examples makes me hopeful that OpenAI is going to focus a bit more on interface visual development/integration from tools like Figma. This is one area where Anthropic definitely reigns supreme.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • nickandbro

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 7:10 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Very impressive! Interesting how all other benchmarks it seems to surpass Opus 4.7 except SWE-Bench Pro (Public). You would think that doing so well at Cyber, it would naturally possess more abilities there. Wonder what makes up the actual difference there

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • impulser_

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 6:15 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      What is the reason behind OpenAI being able to release new models very fast?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Since Feb when we got Gemini 3.1, Opus 4.6, and GPT-5.3-Codex we have seen GPT-5.4 and GPT-5.5 but only Opus 4.7 and no new Gemini model.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Both of these are pretty decent improvements.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • minimaxir

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 6:15 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Competition.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • steinvakt2

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 7:54 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Can't be just that. There was competition in the GPT-4 era. But we didn't get model drops every month.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • pixel_popping

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 6:19 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                This is frankly exciting, outside of the politics of it all, it always feel great to wake up and a new model being released, I personally will stay awake quite long tonight if GPT-5.5 drop in codex.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • apical_dendrite

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 2:23 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I don't find it exciting at all. I just feel anxiety about my career and my place in the world. I have a set of skills that I've developed over many years. I care about what I create. I consider it a craft. When I use my skills to solve a hard problem, I feel good about myself. When the AI does the work for me, I don't get that sense of accomplishment. I am seeing my value evaporate before my eyes.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I hate this stuff and I wish it had never been invented.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • pixel_popping

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 10:39 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        You might want to rethink this, think of this as the opportunity of a lifetime, the beginning of a new era, the same as the early Internet, where you do have the chance to set yourself for life now, this window is getting shorter and shorter, but you can't deny that you do have the potential NOW to thrive or start multiple businesses without much capital. Think also that the best thing in the end, is probably to build great things, regardless on how we build them, making the world progress.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • literalAardvark

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 6:32 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Anthropic is really tiny, and Google is just being Google, their models are just to show that they're hip with what the kids are doing.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • wmf

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 6:53 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                I wonder if it's the same model and they just keep adding more post-training.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • Squarex

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 7:02 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    The rumor was that the 5.5 is a brand new pretrain. But who knows, it's 2x as expensive as 5.4, so it would check out.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • hyperbovine

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 8:38 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        If so that would be big, they haven’t been able to successfully pretrain in close to two years (since 4o).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • tantalor

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 7:00 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  They aren't new models.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • aetherspawn

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 10:26 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Umm yeah but this is like every release in the last 3 years.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                The big question is: does it still just write slop, or not?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Fool me once, fool me twice, fool me for the 32nd time, it’s probably still just slop.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • YmiYugy

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 6:14 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  So according to the benchmarks somewhere in between Opus 4.7 and Mythos

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • jorl17

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 6:15 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      GPT 5.4 is already better than Opus 4.7 to me. But, then again, Opus 4.7 is a massive disappointment. I hope they don't discontinue 4.6.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • robwwilliams

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 7:16 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Depends in goals. For long free-firm discussions I find Opus 4.7 Adaptive better/deeper than Opus 4.6 Extended. But usual caveats apply: first week of use and token budget seems generous now on Max 5X.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • coffeemug

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 8:04 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I had the opposite experience. Opus 4.6 extended feels like the first genuinely intelligent model to converse with, Opus 4.7 adaptive feels like slightly smarter LinkedIn slop.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • steinvakt2

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 6:49 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            I’ve had great experience using opus 4.7 in cursor. Works for everything including iOS frontend

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • jorl17

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 6:50 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Cursor is what I daily-drive. 4.7 has been terrible for my mostly python-driven work (whereas Opus 4.6 was literally revolutionary to me). Our frontend folks are also complaining.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                I left a comment here with this sentiment https://news.ycombinator.com/item?id=47879896

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • benjiro3000

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 6:53 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • yesterday at 7:06 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • yesterday at 6:49 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • w10-1

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 9:42 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            NYTimes article - on the same day?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              https://www.nytimes.com/2026/04/23/technology/openai-new-model.html
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            I can see how some model releases would meet the NY Times news-worthy threshold if they demonstrated significance to users - i.e., if most users were astir and competitors were re-thinking their situation.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            However, this same-day article came out before people really looked at it. It seems largely intended to contrast OpenAI with Anthropic's caution, before there has been any evidence that the new model has cyber-security implications.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            It's not at all clear that the broader discourse is helping, if even the NY Times is itself producing slop just to stoke questions.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • ionwake

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 6:30 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              is there anywhere I can try it? ( I just stopped my pro sub ) but was wondering if there is a playground or 3rd party so i can just test it briefly?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • deaux

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 10:41 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                ctrl+f "cutoff, 0 results"

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Surely it doesn't still have the same ancient data cutoff as 5.4 did?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • Manik_agg

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 5:05 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  OpenAI finally catching up with claude

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • k2xl

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 6:23 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Surprised to see SWE-Bench Pro only a slight improvement (57.7% -> 58.6%) while Opus 4.7 hit 64.3%. I wonder what Anthropic is doing to achieve higher scores on this - and also what makes this test particular hard to do well in compared to Terminal Bench (which 5.5 seemed to have a big jump in)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • vexna

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 6:31 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        There's an asterisk right below that table stating that:

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        > *Anthropic reported signs of memorization on a subset of problems

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        And from the Anthropic's Opus 4.7 release page, it also states:

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        > SWE-bench Verified, Pro, and Multilingual: Our memorization screens flag a subset of problems in these SWE-bench evals. Excluding any problems that show signs of memorization, Opus 4.7’s margin of improvement over Opus 4.6 holds.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • conradkay

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 6:35 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Was 4.7 distilled off Mythos (which got 77.8%)? Interesting how mythos got 82% on terminal-bench 2.0 compared to 82.7% for GPT-5.5.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Also notice how they state just for SWE-Bench Pro: "*Anthropic reported signs of memorization on a subset of problems"

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • cynicalpeace

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 6:13 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        It's possible that "smarter" AI won't lead to more productivity in the economy. Why?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Because software and "information technology" generally didn't increase productivity over the past 30 years.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        This has been long known as Solow's productivity paradox. There's lots of theories as to why this is observed, one of them being "mismeasurement" of productivity data.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        But my favorite theory is that information technology is mostly entertainment, and rather than making you more productive, it distracts you and makes you more lazy.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        AI's main application has been information space so far. If that continues, I doubt you will get more productivity from it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        If you give AI a body... well, maybe that changes.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • hol4b

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 1:59 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            25 years of shipping software, and IT absolutely increased productivity - just not for everyone, not everywhere. Some workflows got 10x faster, others got slower from meetings about the new tools.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            AI feels the same. I'm shipping indie apps solo now that would have needed a small team five years ago. But in bigger orgs I see people spending 20 minutes verifying 15-minute AI output that used to be a 30-minute task they'd just do. Depends where you sit.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • ewrs

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 7:05 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Its quite possible the use of LLMs means that we are using less effort to produce the same output. This seems good.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              But the less effort exertion also conditions you to be weaker, and less able to connect deeply with the brain to grind as hard as once did. This is bad.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Which effect dominates? Difficult to say.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Of course this is absolutely possible. Ultimately there was a time where physical exertion was a thing and nobody was over-weight. That isn't the case anymore is it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • aerhardt

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 6:57 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                > "information technology" generally didn't increase productivity

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Do you think it'd be viable to run most businesses on pen and paper? I'll give you email and being able to consume informational websites - rest is pen and paper.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • cynicalpeace

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 7:04 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Productivity metrics were better when businesses were run on just pen and paper. Of course, there could be many confounding factors, but there are also many reasons why this could be so. Just a few hypotheses:

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    - Pen and paper become a limiting factor on bureaucratic BS

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    - Pen and paper are less distracting

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    - Pen and paper require more creative output from the user, as opposed to screens which are mostly consumptive

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    etc etc

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • theLiminator

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 7:56 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        > Productivity metrics were better when businesses were run on just pen and paper

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        What metrics are these?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • cynicalpeace

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 8:12 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Productivity growth. If you take rolling averages from this chart, it clearly demonstrate higher productivity growth before the adoption of software. This is a well established fact in econ circles.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            https://fred.stlouisfed.org/graph/?g=1V79f

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • simianwords

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 8:36 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                I think this is a classic case of reading into specific arguments too deeply without understanding what they really mean in the grand picture. Few points to easily disprove this argument

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                - if it were true that software paradoxically reduces productivity, you can just start a competing company that doesn't use software. Obviously this is ridiculous - top 20 companies by market cap are mostly Software based. Every other non IT company is heavily invested in software

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                - if you might say the problem is it at the country level, it is obvious that every country that has digitised has had higher productivity and GDP growth. Take Italy vs USA for instance.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                - if you are saying that the problem is even more global, take the whole world - the GDP per is still pretty high since the IT revolution (and so have other metrics)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                If you still think there's something more to it, you are probably deep in some conspiracy rabbit hole

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • cynicalpeace

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 9:11 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    The data clearly shows that productivity growth is flat or even declining. What is your accounting of why software hasn't offset those numbers?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • simianwords

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 9:16 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        You don't have a counterfactual to suggest that it would have continued increasing had it not been for technology. Is there _any_ credible economist who suggests that we might have higher productivity without tech?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • cynicalpeace

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 10:15 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            There is no counterfactual needed. Productivity growth has declined, despite the expectation that software would accelerate productivity. I'm asking you why this didn't happen.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • simianwords

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 10:17 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                There is a counterfactual needed because it is not clear whether the growth would not have declined even more without Software.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Again I'm asking - is there a single credible economist who says that the growth would have been higher without technology?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • cynicalpeace

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 1:23 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I'm not even proposing that growth would have been higher without "technology". I said information technology has not increased productivity growth compared to the past. This is an observation of fact.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • simianwords

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 5:25 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        > Productivity metrics were better when businesses were run on just pen and paper

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        This is what you said.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • eiksjs

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 8:47 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Is there a way to mute people who are clearly AI boosters? ^

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • simianwords

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 8:49 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          ? you are literally commenting on the release of a new model from OpenAI in a tech focused community. Have you considered what should be normal here?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • aiaiai177

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 6:39 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Downvoted by the AI Nazis. They are running a tight ship before the IPOs.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • cbg0

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 6:42 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        I downvoted it because it doesn't add anything useful to the conversation, and I don't own any AI stock.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • cynicalpeace

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 6:55 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            It's a hypothesis that "smarter" AI models, ie GPT-5.5, may not be a great boon to productivity. Given that this is the raison d'etre of AI models, and improving them, I don't see why it is any less useful than any other discussion.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • yesterday at 6:51 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • AbuAssar

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 7:46 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    This is the first time openAi include competing models in their benchmarks, always included only openAi models.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • tantalor

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 6:58 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      > A playable 3D dungeon arena

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Where's the demo link?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • zerotosixty

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 8:09 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Those who are using gpt5.5 how does it compare to Opus 4.6 / 4.7 in terms of code generation?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • renecito

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 11:21 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          why the stats of every AI on every release looks around the same?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Are the tests getting harder and harder so the older AIs look worst and the new ones look like they are "almost there" ?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • gordonhart

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              today at 2:17 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Yes, once benchmarks get saturated they get replaced by harder ones. You don’t see GSM8K, MMLU, or HellaSwag anymore because they’re essentially solved. It takes constant work to make benchmarks hard enough to show meaningful model performance differences but easy enough to score higher than the noise threshold.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • immanuwell

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 8:28 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Big claims from OpenAI as usual - GPT-5.5 sounds impressive on paper, but we've been down this road before, so I'll believe the 'no speed tradeoff' part when I see it in the wild

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • yesterday at 11:14 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • Pooge

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 8:59 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Up until now I only paid LLM subscriptions to Anthropic but I'm going to give ChatGPT a chance when my current subscription runs out next month.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • faxmeyourcode

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 6:34 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  How does it compare to mythos?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • adam12

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 9:57 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "Sometime with GPT-5.5 I become lazy"

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I don't want to be lazy.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • embedding-shape

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 11:36 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Meanwhile, me being lazy is what makes me a better developer. If I wasn't lazy, I wouldn't be able to program either I think.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • objektif

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 6:10 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Are there faster mini/nano versions as well?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • tedsanders

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 6:15 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Not this time, no.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • abi

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 6:20 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Usually, those get released a few weeks later.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • Schlagbohrer

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 9:50 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          entering this comments area wondering if it will be full of complaints about the new personality, as with every single LLM update

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • cchrist

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 7:50 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Which is better GPT-5.5 or Opus 4.7? And for what tasks?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • senko

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 7:21 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I might just be following too many AI-related people on X, but omg the media blitz around 5.5 is aggressive.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Soo many unconvincing "I've had access for three weeks and omg it's amazing" takes, it actually primes me for it to be a "meh".

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I prefer to see for myself, but the gradual rollout, combined with full-on marketing campaign, is annoying.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • user34283

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 8:35 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  I used it last night for iOS app development and it felt like a noticeable improvement.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  With the Pro plan it was available in both Codex and ChatGPT already when I first checked, which was within an hour of the release.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • phillipcarter

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 6:49 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                ... sigh. I realize there's little that can be done about this, but I just got through a real-world session determining of Opus 4.7 is meaningfully better than Opus 4.6 or GPT 5.4, and now there's another one to try things with. These benchmark results generally mean little to me in practice.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Anyways, still exciting to see more improvements.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • egorfine

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 7:47 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  > We are releasing GPT‑5.5 with our strongest set of safeguards to date

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  ...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  > we’re deploying stricter classifiers for potential cyber risk which some users may find annoying initially

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  So we should be expecting to not be able to check our own code for vulnerabilities, because inherently the model cannot know whether I'm feeding my code or someone else's.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • dannyw

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 9:05 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Hopefully not, because checking your codebase for vulnerabilities is really valuable.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I hope it’s just limits on pentesting and stuff, and not for code analysis and review.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • lucb1e

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 11:05 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          But how do it know?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • vardump

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 7:01 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I just can't bear to use services from this company after what they did to the global DRAM markets.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I'm not trying to make any kind of moral statement, but the company just feels toxic to me.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • woeirua

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 6:48 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Nice to see them openly compare to Opus-4.7… but they don’t compare it against Mythos which says everything you need to know.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      The LinkedIn/X influencers who hyped this as a Mythos-class model should be ashamed of themselves, but they’ll be too busy posting slop content about how ā€œGPT-5.5 changes everythingā€.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • A_D_E_P_T

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 8:35 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Almost nobody can actually use Mythos, though?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • throwaway2027

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 6:43 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Good timing I had just renewed my subscription.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • I_am_tiberius

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 6:24 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          I'd really like to see improvements like these: - Some technical proof that data is never read by open ai. - Proof that no logs of my data or derived data is saved. etc...

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • anematode

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 10:27 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I don't think this is technically possible without something like homomorphic encryption, which poses too large of a runtime cost for usage in LLMs

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • I_am_tiberius

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 12:27 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  They don't even try to proof it another way.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • numbers

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 6:31 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            I've stopped trusting these "trust me bro" benchmarks and just started going to LM Arena and looking for the actual benchmark comparisons.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            https://arena.ai/leaderboard/code

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • stri8ted

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 6:44 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                I doubt this is representative of real world usage. There is a difference between a few turns on a web chatbot, vs many-turn cli usage on a real project.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • nba456_

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 6:40 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  This is not any better of a benchmark

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • ace2pace

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 8:18 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                I hear its as good as Opus 4.7.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                The battle has just begun

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • swrrt

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 4:02 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I heard someone said it is better than Opus 4.7. Recently, a lot of my friends complain about Opus 4.7 and previous models performance degradation.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • damnitbuilds

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  today at 11:43 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Woop woop !

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Now, after all this time, this must shurely be the release that does all software developers out of a job ?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Or has Dirty Sam being caught lying, again ?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Cos I've still got a programming job, and GPT can't do it for shit.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • nickandbro

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 10:09 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I just prompted GPT-5.5 Pro "Solve Nuclear Fusion" and it one shotted it (kidding obviously)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • yesterday at 7:38 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • theihtisham

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 2:34 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        i just installed Codex and And Gave try to GPT 5.5 Its Good As compare to previous one

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • debba

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 6:51 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Cannot see it in Codex CLI

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • boring-human

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 8:24 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Did you upgrade the tool binaries? I also couldn't see it until after the upgrade.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • PilotJeff

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            today at 2:47 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            So exhausted from all this endless bs…. Keep releasing , this reminds me of all the .com software during that era where wow we are already at version 3.0 it’s only been 60 Days

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • c0rruptbytes

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 10:16 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              literally cannot launch the codex app anymore

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • aussieguy1234

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 12:28 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                If SWE-Bench Verified is no longer a good measure of agentic coding abilities, what benchmark now is?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • journal

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 11:33 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  does it have cached pricing?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • jawiggins

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 7:58 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    What is the major and minor semver meaning for these models? Is each minor release a new fine-tuning with a new subset of example data while the major releases are made from scratch? Or do they even mean anything at this point?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • gck1

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 9:58 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Nothing. The next major increment is going to happen when marketing department is confident they can sell it as a major improvement without everyone laughing at them. Which at this point seems like never.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        I think Anthropic fearmongering and "leaks" of Mythos was them testing the ground for 5.x, which seems to have backfired.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • elAhmo

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 7:40 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Is Codex receiving 5.4 or 5.5 release?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I am still using Codex 5.3 and haven't switched to GPT 5.4 as I don't like the 'its automatic bro trust us', so wondering is Codex going to get these specific releases at all in the future.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • jedisct1

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 9:07 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        GPT-5.4 is already an incredible model for code reviews and security audits with the swival.dev /audit command.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        The fact that GPT-5.5 is apparently even better at long-running tasks is very exciting. I don’t have access to it yet, but I’m really looking forward to trying it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • wslh

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 9:06 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Related and insightful: "GPT-5.5: Mythos-Like Hacking, Open to All" [1].

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          [1] https://news.ycombinator.com/item?id=47879330

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • ant6n

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 8:41 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            My impression has been that ChatGPT-5.4 has been getting dumber and more exhausting in the last couple of weeks. Like it makes a lot of obvious mistakes, ignores (parts of) prompts. keeps forgetting important facts or requirement.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Maybe this is a crazy theory, but I sometimes feel like they gimp their existing models before a big release to you'll notice more of a "step".

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • atmanactive

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                today at 1:08 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Definitely feels like it.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • varispeed

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 7:19 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              I am sceptical. The generation after 4o models have become crappier and crappier. Hope this one changes the trend. 5.4 is unusable for complex coding work.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • mondojesus

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 6:48 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                I'm still using 5.3 in codex. Are 5.4 and 5.5 better than 5.3 in concrete ways?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • cbg0

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 6:59 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    The benchmarks say so, but try it out with actual tasks and be the judge.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • enraged_camel

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 6:35 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Is this the first time OpenAI compared their new release to Anthropic models? Previously they were comparing only to GPT's own previous versions.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • k2xl

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 6:24 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    ARC-AGI 3 is missing on this list - given that the SOTA before 5.5 <1% if I recall, I wonder if this didn't make meaningful progress.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • redox99

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 6:38 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        It's a silly benchmark anyways.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • cmrdporcupine

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 6:13 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Not rolled out to my Codex CLI yet, but some users on Reddit claiming it's on theirs.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • xnx

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        yesterday at 6:21 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Next up: Google I/O on May 19?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        I have to imagine they'll go to Gemini 3.5 if only for marketing reasons.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • luqtas

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 6:05 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          they are using ethical training weights this time!!! /j

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • throwaw12

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 6:47 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            If anyone tried it already, how do you feel?

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Numbers look too good, wondering if it is benchmaxxed or not

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • i_love_retros

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 9:00 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Oh shiiiiit boy! An incrementation dropped!!

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • yuvrajmalgat

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 7:21 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                finally

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • baxuz

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 8:28 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Ah yes, the next "trust me bro"

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • DrokAI

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    today at 4:54 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • max2026

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      today at 3:32 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • minhajulmahib

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 4:24 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • goldfish_gemma4

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          today at 1:11 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • charliecs

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 6:35 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • hiverrbeyy

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 11:29 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • lukebechtel

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 8:44 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • 1515874411

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 9:17 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  • jeremie_strand

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yesterday at 6:31 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    • yuvrajmalgat

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yesterday at 7:31 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      • marsven_422

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        today at 4:34 AM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        [dead]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • wiseowise

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          yesterday at 8:47 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          [flagged]

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          • MagicMoonlight

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            yesterday at 6:35 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Two hundred pages of shilling and it’s a 1% improvement in the benchmarks. They’re dead in the water.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Imagine spending 100m on some of these AI ā€œgeniusesā€ and this is the best they can do.

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            • XCSme

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yesterday at 7:37 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              2x the price for 1-5% performance gain

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              • justonepost2

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                yesterday at 6:39 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                the attenuation of man nears

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                < 5 years until humans are buffered out of existence tbh

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                may the light of potentia spread forth beyond us

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                • coderssh

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  yesterday at 6:32 PM

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Great modal, I have been using codex and its awesome. Lets see what GPT-5.5 does to it