\

Codex logging bug may write TBs to local SSDs

362 points - today at 7:30 AM

Source
  • ewsbr

    today at 6:22 PM

    Looks like this was fixed[0], so it should land in the next release.

    [0] https://github.com/openai/codex/commit/e98d43ac372ddf7f513c0...

    • b--l

      today at 8:05 AM

      Codex is one of the most infamous examples of slopware. Just having the window unhidden on my mac will cause it to use 100% of the GPU displaying the spinner message.

      THE SPINNER MESSAGE CAUSES 100% GPU USAGE ON AN MBP M5!!

      So any time you're waiting on the model (which is 90% of the time), your fans will be blasting (careful, don't use it on battery).

      The issue is on github and close to 6 months old. Probably since the release of vibe coded junk. I would literally fix it myself but it's closed source for whatever reason.

      There are many discussions about which model is better, or if vibe coding is even possible. I point you to the extent of what one of the most well funded, money flush, well staffed model making companies can do with vibe coding.

      To me a screwup this bad (where the CEO has already made it clear they're now "focussing on coding") indicates that there's something truly broken in the company. No one on polymarket expects them to have a leading model any time soon for example.

      It's a tragedy. The world needs competition to anthropic.

        • jofzar

          today at 9:27 AM

          > Codex is one of the most infamous examples of slopware

          Woah, let's not forget Claude code is right there

            • me551ah

              today at 1:16 PM

              Claude is also weird for being the only coding assistant that for some reason doesn't support AGENTS.md. Codex, Amp, Cursor all of them support it and read from it, but not claude which forces it's users to use CLAUDE.md instead.

              The issue is the higest voted issue on their gitlab repo: https://github.com/anthropics/claude-code/issues/6235

                • ValentineC

                  today at 1:27 PM

                  My CLAUDE.md is just:

                      @AGENTS.md
                  
                  And Claude processes it just fine.

                  (I see that it's a common workaround, and there's a comment in the above link saying just this: https://github.com/anthropics/claude-code/issues/6235#issuec...)

                  It's a hassle having to add it to every repo that I use Claude with though, and I often use other models and harnesses too for the more trivial tasks.

                    • troupo

                      today at 4:11 PM

                      I beg people to learn what symlinks are. The fact that "put @AGENTS.md in there" is a "common workaround" shows why programmers (good ones at least) are not going anywhere soon.

                        • fortuitous-frog

                          today at 5:04 PM

                          I used to use a symlink but was concerned that Claude might see the presence of an "AGENTS.md" file (in e.g, a "List Files" tool call output or from a direct `ls`), be curious and attempt to read it directly (not knowing that it's the same as the "CLAUDE.md" file auto-injected by the harness), and essentially double the token impact / context bloat. Indeed, I did some local experimentation and noticed this was the case, which is why I switched to the explicit "@AGENTS.md" approach.

                          So perhaps there's no need to be rude about it :)

                          • lijok

                            today at 6:17 PM

                            Good programmers know symlinks are not portable

                            • ValentineC

                              today at 4:36 PM

                              One bonus to this approach is that I can add Claude Code-specific stuff in there, that I wouldn't need for other harnesses.

                              • layer8

                                today at 4:34 PM

                                Symlinks aren’t portable.

                                • dpkirchner

                                  today at 5:05 PM

                                  I'm pretty sure some agent harnesses read both files when present, so this @ "aliasing" is more token efficient.

                                  • ahtihn

                                    today at 4:22 PM

                                    Symlinks are a pain if you're on Windows, I'd rather not bother with them.

                                    • today at 5:01 PM

                              • chorkpop

                                today at 2:06 PM

                                CLAUDE.md has been incredibly successful for them advertising wise. I wouldn’t expect them to admit AGENTS.md exists anytime soon.

                                • datsci_est_2015

                                  today at 5:01 PM

                                  Literally trying to use file naming to build a moat. “We can’t switch to Cursor, we’d have to rename all of our files from CLAUDE.md to AGENTS.md!”

                                  • bandrami

                                    today at 3:09 PM

                                    So there's this amazing thing called a symlink

                                    • anon373839

                                      today at 1:28 PM

                                      Gee, I wonder why that is. Do you think Anthropic’s Claude Code team are just trying to protect humanity somehow? Maybe our mortal brains just can’t comprehend the damage that supporting a non-Claude-branded standard might do...

                                    • hexsprite

                                      today at 2:39 PM

                                      I created a Claude Code plugin to load AGENTS.md. Uses symlinks but it’s better than no support.

                                      https://github.com/hexsprite/claude-agents-md

                                        • arcanemachiner

                                          today at 3:46 PM

                                          Claude Code supports native imports: `@AGENTS.md`

                                            • OJFord

                                              today at 3:49 PM

                                              To be fair if you can do it through some kind of plugin or skill it does spare you having a CLAUDE.md of `@AGENTS.md` in every repo individually.

                                              • troupo

                                                today at 4:11 PM

                                                You two realise that symlinks exist, right? That you need neither a "plugin" nor a "native import"?

                                    • kokada

                                      today at 10:36 AM

                                      Not that Claude Code is much better, I just hit this issue[1] because it seems setting DO_NOT_TRACK=1 seems enough to get a really strange behavior in the newest versions of CC.

                                      [1]: https://github.com/anthropics/claude-code/issues/69238#issue...

                                      Edit: I think I misunderstood OP, they're saying that CC is even worse and not better than Codex CLI.

                                      • mvATM99

                                        today at 9:49 AM

                                        Yeah exactly.

                                        I'm not exactly building TUI's every day, but even i felt pain when i read that "small game engine" post

                                          • matheusmoreira

                                            today at 12:19 PM

                                            At least game engines manage to render their frames properly. Claude Code sometimes eats entire paragraphs of text output, resulting in things such as numbered lists jumping from 2 to 4 out of nowhere.

                                            I'd just ask Claude to repeat himself at first but it happens so often that I actually made a little tool to dig up the output inside the session history and present it properly in a separate terminal.

                                            • TacticalCoder

                                              today at 12:12 PM

                                              > I'm not exactly building TUI's every day, but even i felt pain when i read that "small game engine" post

                                              The bigger issue is they where somehow thinking it was "cool" and "advanced" while it's just a kludgy rube-goldbergy monstrous hack.

                                              Which is of course only semi-working: to me the model thinking what you see is what it outputs in the TUI is the deal-breaker for me. It's of course not working like that for they're apparently, in their "game engine", converting on the fly a headless browser to approximated characters to display in the terminal. So the model tells you he did output ASCII but people are copy/pasting (because, yes, at times you want to copy/paste) Unicode chars.

                                              Plenty of bug reports and pissed users.

                                              That's the bigger issue.

                                              The biggest issue is those thinking a 10 GB VM required to run a headless Electron browser and then fuxx0ring characters conversion is somehow an achievement.

                                        • varjag

                                          today at 11:34 AM

                                          Right, just yesterday I found my laptop kinda hot. And what do you think, it was good old Claude deciding to load a few cores with completely idling prompts.

                                            • cozzyd

                                              today at 2:27 PM

                                              [dead]

                                          • sambcui

                                            today at 2:59 PM

                                            I don’t know if you can resonate, but I feel like the Vibe Coded codex and Claude Code desktop apps are iterating way faster than they should be.

                                              • malfist

                                                today at 3:00 PM

                                                How are they iterating? I've not noticed anything major changing between the versions of my claude code. Other than that sometimes this version includes /btw and sometimes it's missing.

                                            • iLoveOncall

                                              today at 12:27 PM

                                              Surprisingly Kiro is fine (I work at Amazon but not at all on the Kiro team). I prefer it to anything else I've tried (except Amazon Q Developer in IntelliJ, but it's now deprecated).

                                                • epistasis

                                                  today at 2:28 PM

                                                  Kiro is surprisingly good, if the interface for saving and resuming was slightly more reliable, and there was the hope of remote sessions, I'd probably switch to it full time. I vastly prefer it to having to fight against buggy force-fed features like UltraPlan or whatever.

                                          • r_lee

                                            today at 10:25 AM

                                            if we are at 10x with AI and near AGI or ASI, then how is it possible that these products (Codex, Claude Code CLI) are still such garbage?

                                            shouldn't this "agentic AI revolution" have long solved this already?

                                            no way they're over there saying "we are on it plz wait" or that "it's too much effort"?

                                              • thewebguyd

                                                today at 3:18 PM

                                                > shouldn't this "agentic AI revolution" have long solved this already?

                                                Daily reminder that Anthropic took over a year to fix the Claude Code terminal flickering issue despite proclaiming all over the internet that software development as a "solved problem."

                                                Apple forked over $250 Million in a class action over false advertising for Apple Intelligence. When do we start seeing the same for the misleading and outright false claims coming out of the frontier labs about the model capabilities? At this point the marketing is doing more harm than the technology itself because its warping the perceptions of those at the top that make decisions. The only reason tokenmaxxing was ever a thing was because marketing mislead execs and technology decisions were made based on vibes instead of evidence.

                                                  • mannanj

                                                    today at 6:13 PM

                                                    As long as a majority of the people of the living class are gullible and naive and sick, entrained behavior from the institutions and media they are made to consume, they stop seeing the misleading and false claims. Or at least they myopically see it short enough to complain about it in an ineffective way, then continue to consume the next big lie or slop. Until something happens that channels that accumulated rage finally into a cause they feel makes things right (assuming they have not already died and the next generation has been groomed to fall for the rich man's trap) and those who's family and next generation is to continue the extraction and trickery hides behind an anonymous personality or system.

                                                • igleria

                                                  today at 10:52 AM

                                                  This is the biggest elephant in the room I have seen in my decade+ career. At the same time, look how bad Apple is in software compared to its hardware... It's not an AI only problem, it's almost like software in general gets a free pass on being very unsafe or low quality because no one wants to face the same "profit reducing red tape" that civil engineers or similar face.

                                                    • CharlieDigital

                                                      today at 11:12 AM

                                                      Anthropic were the progenitors of the Model Context Protocol. Claude Code does not fully implement the client end of the protocol. A protocol; a literal pre-defined spec that an agent should be able to one-shot. Neither does Codex. Codex does not implement MCP Prompts.

                                                      (I want Codex to implement MCP Prompts because then we have one central way to ship skills from a server).

                                                      The fact that neither platform can implement a protocol given what is functionally infinite frontier model tokens really says a lot. I do not care what kind of random project some influencer can ship with a swarm of 1000 agents. If you cannot make the basics work, it is a farce.

                                                        • deathbob

                                                          today at 1:38 PM

                                                          It still boggles my mind that Anthropic would invent the MCP protocol but not fully implement it.

                                                          Especially when fully implementing it (prompts, resources, tools) is easily done in harnesses that don’t ship with MCP but allow good extension / modification like Pi.

                                                          Claude not being able to see its own usage or self invoke slash commands is also very frustrating.

                                                            • oblio

                                                              today at 2:16 PM

                                                              > It still boggles my mind that Anthropic would invent the MCP protocol but not fully implement it.

                                                              https://www.joelonsoftware.com/2002/01/06/fire-and-motion/

                                                              > Do they just want to force you to keep busy reacting to their volleys, so you can’t move forward?

                                                                • CharlieDigital

                                                                  today at 4:12 PM

                                                                  > ...Do they just want to force you to keep busy

                                                                  Given functionally unlimited access to tokens with frontier models, there is really no "force you to keep busy"; it should just bake overnight. We're talking about a rather simple and well-defined specification; not something novel and complex.

                                                      • thewebguyd

                                                        today at 3:28 PM

                                                        > same "profit reducing red tape" that civil engineers or similar face.

                                                        I don't think we should ever head toward licensing/a credential body for software development, but I do think now is a good time to have discussions around liability for defective products.

                                                        A good start would be to stop allowing companies to disclaim all warranties of fitness for a particular purpose in their EULAs. The joke of Microsoft Copilot applies here where they have a big disclaimer that "Copilot is for entertainment purposes only" while advertising says otherwise. Not even the chrome EULA will agree that its fit for purpose as a web browser. The clause is a get out of jail free card that shifts all liability and risk to the end user.

                                                          • datsci_est_2015

                                                            today at 5:09 PM

                                                            > I don't think we should ever head toward licensing/a credential body for software development, but I do think now is a good time to have discussions around liability for defective products.

                                                            Liability is how a credential body would organically grow. It already exists in the security, compliance, and enterprise parts of the software world.

                                                        • forshaper

                                                          today at 2:34 PM

                                                          How much of all this is due to hardware improving, and software bloating enough to fill the capacity?

                                                      • hombre_fatal

                                                        today at 12:14 PM

                                                        Like anything, you have to decide between polish vs switch to any other task in the queue. If you choose too much from the latter, then polish suffers, yet that's a human thing.

                                                        Also, Codex and Claude Code aren't as bad as people say. I think most of the noise is embellished by the "hah see? AI sucks" angle.

                                                        It's kind of like how HNers would claim to your face that you can't actually build anything with Javascript and Node.js (JS just sucks too much), then they'd list off a few footguns that were supposed to demonstrate why. In other words, champing at the bit for JS to lead people to catastrophize issues that were pretty mediocre.

                                                          • geodel

                                                            today at 2:55 PM

                                                            > yet that's a human thing.

                                                            is this joke?

                                                            Here we are talking about trillon dollar AI companies who claim AI can fix decade old bugs and create new compilers, OSs and what not. Are parallel agents working autonomously to fix issues as well as create new features not allowed at these companies?

                                                              • hombre_fatal

                                                                today at 3:06 PM

                                                                Humans still decide what LLMs do in a code base, full stop.

                                                            • coldtea

                                                              today at 1:19 PM

                                                              >Like anything, you have to decide between polish vs switch to any other task in the queue

                                                              Why do you "have to decide"? Let some agents go at both of those, isn't that what they claim people can just do?

                                                              >Also, Codex and Claude Code aren't as bad as people say. I think most of the noise is embellished by the "hah see? AI sucks" angle.

                                                              Why shouldn't it? They're not the ones making the extraordinary claims.

                                                                • hombre_fatal

                                                                  today at 2:58 PM

                                                                  > Why do you "have to decide"? Let some agents go at both of those, isn't that what they claim people can just do?

                                                                  Because your code is still marching somewhere in tokens per second. You have to decide where they are allocated: polish or the next thing. Humans still are the ones prompting LLMs and deciding what is done.

                                                                  > isn't that what they claim? Why shouldn't it? They're not the ones making the extraordinary claims.

                                                                  Even if I grant that someone else makes excessive claims, why would that let you off the hook to stay grounded?

                                                                  Though I don't grant it. Maybe if Anthropic claimed that Opus makes all decisions at the company and builds all software without humans doing all the prompting, the critics would make more sense.

                                                                  Until then, it looks more like a double standard: if software built with AI has any issues, then see, AI is shit and the humans who invoked it had no role in it. e.g. it could be the case that Anthropic's Claude Code engineers just aren't doing as much polish as they should.

                                                                  Better answer: Someone asked why might it be the case that AI-written software has issues, and it has a real answer. Marketing claims are a different conversation.

                                                                    • geodel

                                                                      today at 3:13 PM

                                                                      > Maybe if Anthropic claimed that you could write an unsupervised loop that writes perfect software, the critics would make more sense.

                                                                      Or to be upstanding, ethical companies that they are. Just put disclaimer after every prompt response and on their website "AI generated code has no absolutely no guarantee of quality or correctness. Human prompter must be held accountable for any mistake or inaccuracies."

                                                                      Hope it wouldn't be too much bother to these important companies.

                                                                        • thewebguyd

                                                                          today at 3:31 PM

                                                                          See, but that would counter act all of their marketing and hurt the feelings of all the execs that desperately want to believe that software development is "solved" and in the near future they won't have to hire those expensive, pesky developers ever again.

                                                                            • geodel

                                                                              today at 4:51 PM

                                                                              Two trends I see at work:

                                                                              1) No more human written code in projects, all code must be AI generated.

                                                                              2) Developers are responsible for all code AI generated.

                                                                              Combine that with fear of losing job and you have no one calling out management bullshit on their face.

                                                          • jeffybefffy519

                                                            today at 11:41 AM

                                                            Because vibe coding is a toy… thats the secret.

                                                            You can use it to accelerate development certainly, but that requires careful change->review cycles. The developer still needs to be in heavy control, versus vibe coding having an agent own the code base.

                                                            • ValentineC

                                                              today at 1:32 PM

                                                              The "AI revolution" feels like it's creating a bunch of ultra-smart AI models are scarily good at cracking most of human-created security (Mythos), but also happen to be careless snobs that just leave litter and mess in their wake.

                                                              • layer8

                                                                today at 4:39 PM

                                                                The issue is that apparently AI coding means that developers stop caring about software quality. Which puts the whole purpose into question.

                                                                • mnicky

                                                                  today at 12:40 PM

                                                                  If the code churn is high the investment to refactoring etc is less beneficial than may be obvious. I don't remember the details but I heard in some podcast that the code base of Claude Code changes so fast that any piece of code won't be there for long..

                                                                    • coldtea

                                                                      today at 1:21 PM

                                                                      In other words it's an ever moving vibe fest, with random bugs and misbehaviors each time they roll the dice...

                                                                        • tartoran

                                                                          today at 1:34 PM

                                                                          Yes, it’s very characteristic of gen-AI era.

                                                                      • today at 1:51 PM

                                                                        • tartoran

                                                                          today at 1:32 PM

                                                                          If they respected their users they’d at least pin some versions that are more stable.

                                                                      • fg137

                                                                        today at 10:53 AM

                                                                        You are asking too many good questions.

                                                                        • user43928

                                                                          today at 11:11 AM

                                                                          The products generally work just fine on my MacBook.

                                                                          I have not encountered major issues in either the Claude Code CLI, the Codex Desktop app, or Claude Desktop app.

                                                                          They generally get the job done. I don't measure disk writes or analyze the GPU usage.

                                                                          • today at 12:11 PM

                                                                            • reducesuffering

                                                                              today at 5:38 PM

                                                                              Claude Code has been out for just 1 year and has millions of users already, being a major contribution to roughly $40 billion in revenue. By any stretch it is one of the most extremely fast developed products driving the most important workflow for millions of people already.

                                                                              "Why isn't literally everything about a product that came out a year ago with an extremely fast scaling userbase solved?" is what I hear.

                                                                              The goalposts will keep moving until AGI is undeniable.

                                                                              • Zababa

                                                                                today at 11:40 AM

                                                                                A simple explanation is that they are "good enough" for most people and they have better things to do. Even if tomorrow I was 100 times as productive, I still wouldn't have time to do literally everything and I would have to prioritize.

                                                                                  • coldtea

                                                                                    today at 1:22 PM

                                                                                    You might not.

                                                                                    But the Claude Code team has ONE job.

                                                                                    And they have full access to a platform that they advertise as "humanity-threat" level good, and claim that it can automate everything code related...

                                                                                      • Zababa

                                                                                        today at 1:53 PM

                                                                                        I think they have more than one job, they have to balance new features with improving the software itself. And Anthropic has to balance investing resources into Claude Code vs on infra or other things.

                                                                                        Not that I'm happy with the current state of things, in fact I'm quite sad that improvements in capacity to do things doesn't translate into better quality.

                                                                                          • troupo

                                                                                            today at 4:16 PM

                                                                                            > they have to balance new features with improving the software itself.

                                                                                            What new features?

                                                                                            > And Anthropic has to balance investing resources into Claude Code vs on infra or other things.

                                                                                            It seems they are doing neither? Their vibe-coders boast everywhere that they no longer even work, but just endlessly prompt Claude Code in a loop. Perhaps that's why there's no polish? Perhaps that's why their spring post about Claude Code issues reads like "these are all issues that would take a junior programmer a day to test and fix before they ever reached production"? https://www.anthropic.com/engineering/april-23-postmortem

                                                                            • nicce

                                                                              today at 9:57 AM

                                                                              Not only Codex, but I can't leave ChatGPT app in macOS open for few hours, because it will consume 60 gigabytes of RAM over time and crashes all the apps.

                                                                              Mindboggling. Or can't use Google's AI Studio in browser because it takes 100% CPU.

                                                                              Need to write own app for everything???

                                                                                • nbaksalyar

                                                                                  today at 4:13 PM

                                                                                  It's not just Google AI Studio, it's also Google proper. Just one search result page consumes gigabytes of RAM. How did this happen? I've switched to DDG and never looked back.

                                                                                  • veber-alex

                                                                                    today at 12:16 PM

                                                                                    ChatGPT works ok for me but Whatsapp consumes 1000% cpu after the mac wakes up after sleep.

                                                                                    I swear a few years ago shit like this didn't happen on macOS.

                                                                                      • coldtea

                                                                                        today at 1:24 PM

                                                                                        A few years ago vibe-coded crap apps like that didn't exist on macOS.

                                                                                    • porridgeraisin

                                                                                      today at 10:03 AM

                                                                                      the damn chat.openai.com webapp lags a lot as well on long chats, typing takes so long.

                                                                                        • rsfern

                                                                                          today at 12:46 PM

                                                                                          In my experience the input field lags on short chats too, sometimes in the middle of writing the second or third prompt. Are they running some kind of prospective evaluation or something?

                                                                                            • nicce

                                                                                              today at 1:06 PM

                                                                                              When you are writing completely new prompt - it sends every character to server when writing and tries to make suggestions based on that.

                                                                                              And keeps doing it in intervals in /prepare endpoint, during each prompt.

                                                                                              So if you are working with something sensitive - don't write it to browser directly and edit it there.

                                                                                                • porridgeraisin

                                                                                                  today at 1:20 PM

                                                                                                  But then why does it become dog slow when the chat becomes bigger?

                                                                                  • Zenul_Abidin

                                                                                    today at 5:53 PM

                                                                                    I have been wondering why my battery dies quickly when I have codex open, even in my tray

                                                                                    I only noticed the CPU spike with Process Explorer also in my tray.

                                                                                    • giancarlostoro

                                                                                      today at 1:59 PM

                                                                                      > It's a tragedy. The world needs competition to anthropic.

                                                                                      I agree, though Sam Altman's company is the last option I'd want to replace Claude with. I would sooner exhaust every open model.

                                                                                      • xpct

                                                                                        today at 9:33 AM

                                                                                        Well thank you for your service. I thought about trying out Codex after the disaster that is Claude Code. I'll be fine without either one on my machine

                                                                                          • jofzar

                                                                                            today at 9:51 AM

                                                                                            Imo codex is significantly better then Claude code for me ATM.

                                                                                            • christophilus

                                                                                              today at 12:20 PM

                                                                                              Codex is much better, which is to say, it’s only pretty bad.

                                                                                              • comboy

                                                                                                today at 9:56 AM

                                                                                                I mean, Codex CLI is really bad. But Claude's CLI is so much worse.

                                                                                                Welcome to the world of tomorrow!

                                                                                            • CryZe

                                                                                              today at 1:32 PM

                                                                                              > THE SPINNER MESSAGE CAUSES 100% GPU USAGE ON AN MBP M5!!

                                                                                              This seems to be a common Chromium problem across tons of software. GitHub has the same issue with its spinners, VSCode as well.

                                                                                              • markdog12

                                                                                                today at 12:36 PM

                                                                                                This software has been terrible for me. Burns tokens like crazy, and fails. Most times I try to use the browser plugin, it just says it can't use the plugin. When it does work, it takes minutes to click a button. Unusable workflow.

                                                                                                I ask to generate a png with an alpha channel. It can't. Instead, it outputs a chroma-keyed image, then generates a python script to remove chroma key (fails), then a js script (which also fails). Then my 5h allotment is up.

                                                                                                It's frustrating because if it worked as they advertise, it'd be an amazing tool.

                                                                                                  • EMM_386

                                                                                                    today at 1:02 PM

                                                                                                    Although they can technically do it, I wouldn't be asking LLMs to generate binary files like PNG with alpha channels, no matter how simple that may seem. If it's easy enough to manually create one yourself, I would do that.

                                                                                                    The best way for LLMs to do this is likely to write a scratch program (which is what it seems to have reached for in the second half), write code (which they are good at) and have the library create the image.

                                                                                                    At some point it is just easier to handle such things yourself, and use them with text-based formats.

                                                                                                • fps-hero

                                                                                                  today at 3:24 PM

                                                                                                  > THE SPINNER MESSAGE CAUSES 100% GPU USAGE ON AN MBP M5!!

                                                                                                  One conspiratorial idea I had was that this isn't a bug, and that Codex was actually doing computation on users' hardware under the guise of "thinking". Like Folding@home, or bitcoin mining malware, involuntarily on paying customers. Your usage is being subsidized by your personal compute hardware that you can't take advantage of unless it was being applied at massive scale.

                                                                                                  This would make even more sense when you consider that thinking and response time metrics aren't publicly being tracked. There is an assumption that LLM interaction is being processed as fast as possible, but this doesn't align with the reality of fixed hardware and oversubscription. Of course throttling is occurring. So, if you can take advantage of local compute, delay the responses and you have even more access compute!

                                                                                                  I find it difficult to believe that given the scale, number of users, and money involved, that someone hasn't fixed this "bug".

                                                                                                    • CSMastermind

                                                                                                      today at 3:28 PM

                                                                                                      Lol this was my theory as well.

                                                                                                  • today at 1:34 PM

                                                                                                    • l33tman

                                                                                                      today at 8:44 AM

                                                                                                      This was fixed long ago, if I'm thinking of the same bug. It was stuck in an inf loop all the time the codex window was open.

                                                                                                        • cncjvu7

                                                                                                          today at 9:05 AM

                                                                                                          Nah it's still doing weird shit. Uninstalled that crapware last week.

                                                                                                      • seviu

                                                                                                        today at 10:04 AM

                                                                                                        To be fair with Codex, you can use any harness you want with it. Access is not gatekeeper by a crappy full of slop electron app.

                                                                                                        So just move to PI, or whatever.

                                                                                                        Claude on the contrary, forces all plan users to use their horrible app, which, if you ever dared to use cowork, only once, will run a 2GB VM on app start, no f's given. at all.

                                                                                                        Not justifying it. But if you use the official Codex app, thats on you. If you use the official Claude app, it's because you are forced to.

                                                                                                        Sidenote unrelated to the post: since the Fable thing, and after serious thinking, I moved to open source models. I still have the basic OpenAI sub, but then easy lifting is now done elsewhere.

                                                                                                          • coldtea

                                                                                                            today at 1:27 PM

                                                                                                            >if you ever dared to use cowork, only once, will run a 2GB VM on app start, no f's given. at all.

                                                                                                            Of all the issues, this seems like the most tame. I mean, there are single Chrome tabs that can use 300MB or even 700MB. A 2GB VM for what is likely isolated local testing of scripts and commands or local lightweight first-level inference to help guide the main harness sounds reasonable.

                                                                                                            • thewebguyd

                                                                                                              today at 3:58 PM

                                                                                                              Not being able to use my own harness on the subscription plan is my biggest gripe with Anthropic/Claude. For what I work on, I still get better results with Opus than I do with GPT5.5-codex, but damn do I hate that I either have to PAYG or I'm stuck using Claude Code.

                                                                                                              • drdexebtjl

                                                                                                                today at 2:52 PM

                                                                                                                I haven’t ever tried Cowork, and Claude Desktop shipped a 10 GB VM image on the tiny internal storage of my Macbook.

                                                                                                                No way to remove it without hacks like creating an empty, read-only file in its place.

                                                                                                                Having this slop installed and automatically updating is a liability.

                                                                                                            • xenator

                                                                                                              today at 11:38 AM

                                                                                                              I have exactly the same problem with Time Machine spinner on macOS. It even doesn't rotate.

                                                                                                              Somewhere should be rare specialists with diploma who are capable of fixing such problems with waiting lists for years ahead.

                                                                                                              • ljlolel

                                                                                                                today at 2:12 PM

                                                                                                                Building an open source native swift version that doesn’t have that bug: https://github.com/Lore-Hex/Quillcode

                                                                                                                • tengada1

                                                                                                                  today at 1:08 PM

                                                                                                                  I had the exact same frustration and switched to Pi and have had zero complaints

                                                                                                                  • hokkos

                                                                                                                    today at 9:58 AM

                                                                                                                    is it closed source ? i can see the rust code in repo contrary to the JS in claude code repo, are you mixing them up ?

                                                                                                                      • nicce

                                                                                                                        today at 10:10 AM

                                                                                                                        Codex CLI is the main Rust code. There is Codex Desktop separately, using Electron and the same Codex CLI.

                                                                                                                    • NamlchakKhandro

                                                                                                                      today at 1:04 PM

                                                                                                                      Pi mono is the only true harness. Everything else is crap

                                                                                                                        • Supermancho

                                                                                                                          today at 2:43 PM

                                                                                                                          If Pi can't use my MCPs, it's too big a step backward. Is the common tooling: https://github.com/nicobailon/pi-mcp-adapter ?

                                                                                                                            • epistasis

                                                                                                                              today at 3:19 PM

                                                                                                                              I imagine the answer varies greatly but what use cases do people find for MCP over standard command line calls? The only time I use MCP is when I'm supposed to test that an MCP is working. Everything else is just incredibly clunky and ugly. For example, connecting to Linear to grab a ticket is far easier and cleaner to copy and paste the text rather than to have the agent call the MCP and look it up by ticket name.

                                                                                                                              I'm hoping I can find somebody else's MCP that could actually help me for once!

                                                                                                                                • Supermancho

                                                                                                                                  today at 6:00 PM

                                                                                                                                  I use a Godot MCP in addition to my Godot IDE and the Figma MCP

                                                                                                                                  The Godot Engine API (100s of calls) is not worth memorizing and Figma is visual based, along with a complicated engine-specific serialization.

                                                                                                                                  Much easier to ask an LLM to find out why one thing affects another or how to optimally generate a specific change by exploring those APIs. When the LLM eventually fails, to explain the available functions and triage a solution.

                                                                                                                      • jorl17

                                                                                                                        today at 12:49 PM

                                                                                                                        Claude code (desktop) and Codex (desktop) are both absolutely dogshit pieces of software. I can't pick which one is worse. I'd be sort of ashamed to say I actively worked on them, regardless of how they can empower people. Cursor's new UI is similarly terrible. They're all slowly getting better, but too slow for my taste.

                                                                                                                        They are incredibly slow in unpredictable ways, eat up memory at an insane rate, and just feel like they were built with no regards for UX. Like they crammed together all the engineers with no idea of how to build a coherent and predictable UI and let them loose on the product without proper designers.

                                                                                                                        The other day Codex (desktop) was eating up 70GB of RAM on my machine. What had I done? Literally nothing. I opened it and let it update once.

                                                                                                                        Another one with Codex was when I had a specific conversation where no activity was happening and which would make the app spin up all of my CPU cores, rendering it barely usable. It would take seconds to react to anything or update the UI. The conversation wasn't even in focus!!! Restarting the app wouldn't help. After I archived it, it suddenly got better

                                                                                                                        Claude Code Desktop used to be so, soo, soo slow and eat up so much RAM. It was unusable for anything other than playing around when I first tried it. It also didn't communicate any of what it would do. Using it was like living in a world with no affordances, constantly afraid of interacting with them and being faced with some sort of destructive action. Still, it has definitely been improving in terms of the UI experience.

                                                                                                                        Cursor's new agents mode suffers from similar issues. Obscenely slow, hogging CPU without anything going on, breaking with existing UX patterns (some of them already well implemented in their other, more polished, previous version), confusing buttons and labels which don't explain what to do and that sometimes do destructive operations on your code.

                                                                                                                        My favorite cursor absurdity is that if you use their workflow to create a worktree and the worktree setup script fails, the following happens:

                                                                                                                        1. The agent has no idea that it failed, let alone have any logs of the failure

                                                                                                                        2. Often you yourself don't get access to the logs of what failed in that script. Don't ask me, half the time it just says it failed with no further logs.

                                                                                                                        3. When you do get the logs, you cannot copy them in ANY way. You can't even select them. I have had to resort to taking a screenshot to do OCR on it

                                                                                                                        I've also had cursor repeatedly have concurrency/race condition bugs when creating multiple worktrees in parallel. I have 5 tasks, I spin them up all together so they can create 5 worktrees and they crash with random internal cursor errors. Wasn't the point of this abhorrent new UI you've stuffed me with to enable parallelism?

                                                                                                                        It's like people aren't even testing the shit they ship. Which I guess they aren't.

                                                                                                                        I'm a big believer in AI and think it is changing the world and will continue to do so, but I almost get offended at how bad these products for which I am paying (sometimes quite a lot!) are. There's "move fast and break stuff" and then there's "build crap to call stuff".

                                                                                                                          • ljlolel

                                                                                                                            today at 2:11 PM

                                                                                                                            That’s why I’m building an open source native Swift version: https://github.com/Lore-Hex/Quillcode

                                                                                                                            • iknowstuff

                                                                                                                              today at 2:09 PM

                                                                                                                              I’ve been using Codex and Claude in Zed via ACP. Some bugs but overall very pleasant experience vs anything Cursor.

                                                                                                                          • stellamariesays

                                                                                                                            today at 2:11 PM

                                                                                                                            [flagged]

                                                                                                                            • Trialog

                                                                                                                              today at 1:48 PM

                                                                                                                              [dead]

                                                                                                                              • iluvcommunism

                                                                                                                                today at 2:46 PM

                                                                                                                                [dead]

                                                                                                                                • energy123

                                                                                                                                  today at 10:54 AM

                                                                                                                                  Let me guess, there's also a bug where they train on all our data?

                                                                                                                                    • varjag

                                                                                                                                      today at 11:37 AM

                                                                                                                                      They don't need to. You pay them for the privilege to do black box reinforcement learning already.

                                                                                                                              • woadwarrior01

                                                                                                                                today at 8:13 AM

                                                                                                                                Someone posted a temporary workaround for this on X[1].

                                                                                                                                sqlite3 ~/.codex/logs_2.sqlite "CREATE TRIGGER IF NOT EXISTS block_log_inserts BEFORE INSERT ON logs BEGIN SELECT RAISE(IGNORE); END;"

                                                                                                                                Also, I found that running VACUUM FULL on the sqlite file on my laptop shrunk it from 27GB to a mere 73MB[2].

                                                                                                                                [1]: https://xcancel.com/bdsqlsz/status/2067964486615810369

                                                                                                                                [2]: https://xcancel.com/jeethu/status/2068087449469780434

                                                                                                                                  • sgarland

                                                                                                                                    today at 1:30 PM

                                                                                                                                    DB-level rules saving the day once again.

                                                                                                                                    • NamlchakKhandro

                                                                                                                                      today at 1:07 PM

                                                                                                                                      The real solution is to stop using it and switch to Pi

                                                                                                                                        • woadwarrior01

                                                                                                                                          today at 1:19 PM

                                                                                                                                          I’ve been using oh-my-pi with GLM-5.2 xhigh as the main model and GPT-5.5 medium as its advisor model. IMO, the combo works better than either of those models alone.

                                                                                                                                          • today at 2:42 PM

                                                                                                                                    • christophilus

                                                                                                                                      today at 12:53 PM

                                                                                                                                      Well, everyone's bashing on OpenAI as well they should, but just a reminder, unlike Claude Code, Codex is officially available to customize here: https://github.com/openai/codex

                                                                                                                                      It's fairly easy to patch.

                                                                                                                                        • redox99

                                                                                                                                          today at 2:58 PM

                                                                                                                                          That's the CLI, not the codex app which is proprietary.

                                                                                                                                            • milkshakes

                                                                                                                                              today at 5:02 PM

                                                                                                                                              the issue is in the cli and app-server

                                                                                                                                          • Lionga

                                                                                                                                            today at 2:27 PM

                                                                                                                                            [dead]

                                                                                                                                        • i2km

                                                                                                                                          today at 9:17 AM

                                                                                                                                          Shocking. Been open a week and AFAICT just silence from OpenAI. I just find it baffling. You'd think that these vendors would be very sensitive to this sort of issue. I mean, surely they have multiple agents hooked up to github monitoring potential issues and proposing fixes, right? ...right?

                                                                                                                                          Surely it should be trivial for them to have their own tools spinning away trying to fix all the github issues in real time...

                                                                                                                                            • drakythe

                                                                                                                                              today at 3:30 PM

                                                                                                                                              They're pretty bad about fixing issues it seems. My favorite is #2472 which they demonstrated "fixing" on stage on the release of GPT 5, but the ticket is still open and the "fix" hasn't been merged. The original blog that flagged this fact https://blog.tymscar.com/posts/openaiunmergeddemo/ and the issue: https://github.com/openai/openai-python/issues/2472

                                                                                                                                              • cl3misch

                                                                                                                                                today at 4:38 PM

                                                                                                                                                There have been Issues on Github about the same problem since April. I'm using Codex a lot and I'm very happy with its performance (UX and output), but it's baffling they haven't fixed this problem.

                                                                                                                                            • neuralkoi

                                                                                                                                              today at 8:56 AM

                                                                                                                                              Vibe coding takes "move fast and break things" to a whole nother level.

                                                                                                                                                • cryo32

                                                                                                                                                  today at 9:11 AM

                                                                                                                                                  Yeah. Here I am sitting on a major incident at our company because someone’s vibe coded shit went seriously wrong.

                                                                                                                                                    • al_borland

                                                                                                                                                      today at 12:49 PM

                                                                                                                                                      I hope that ends up in the RCA, to show these tools as a real risk, and not swept under the rug, where all blame is shifted elsewhere.

                                                                                                                                                        • cryo32

                                                                                                                                                          today at 12:55 PM

                                                                                                                                                          It'll go under the rug as it always does because no one wants to explain that our AI first strategy was a stupid one that caused a net negative ROI impact and reputational damage.

                                                                                                                                                      • Imustaskforhelp

                                                                                                                                                        today at 9:33 AM

                                                                                                                                                        Can you talk more in detail if possible and are allowed to do so?

                                                                                                                                                        I do know one instance of someone literally losing a job because they vibe-coded their way to prod. Their response/justification was: "The code wasn't written by me. It was written by Claude/Chatgpt"

                                                                                                                                                        They hadn't done anything to the database itself but you betcha that there are some horror stories involving database, lack of proper backups and Vibe-coding gone insanely wrong.

                                                                                                                                                          • cryo32

                                                                                                                                                            today at 10:46 AM

                                                                                                                                                            I can say very little in detail but basically Claude doesn’t have any conceptual idea of order of operations and transactional guarantees which resulted in producing something that failed under normal load. There is an evidence chain to suggest it was asked to do this but did not and that wasn’t verified.

                                                                                                                                                            Our engineers are accountable for what they produce regardless of how so they are cleaning up the extensive mess this made. This will result in a very heated post-mortem meeting between the two factions in the company.

                                                                                                                                                            • ValentineC

                                                                                                                                                              today at 2:39 PM

                                                                                                                                                              > I do know one instance of someone literally losing a job because they vibe-coded their way to prod. Their response/justification was: "The code wasn't written by me. It was written by Claude/Chatgpt"

                                                                                                                                                              People like that and their managers should all be put on PIP right away.

                                                                                                                                                              It's not like there is a lack of talent on the market.

                                                                                                                                                              • flir

                                                                                                                                                                today at 12:03 PM

                                                                                                                                                                > "The code wasn't written by me. It was written by Claude/Chatgpt"

                                                                                                                                                                Culturally (across all LLM use, not just programming) we need to nip that in the bud. If we don't it's going to be the new "someone hacked my social media password" get out of jail free card for avoiding responsibility.

                                                                                                                                                                I don't care what tools you used, but if your name is on it, you're the author and the responsibility is yours. No "it wasn't me it was my typewriter" bullshit.

                                                                                                                                                                  • Imustaskforhelp

                                                                                                                                                                    today at 2:02 PM

                                                                                                                                                                    (Although I was a bit para-phrasing as I don't remember the exact story but something similar was definitely said.

                                                                                                                                                                    I agree and I feel that that company in particular's response to that statement was also the same in terms of: you are responsible for your code no matter what and prompted to fire the engineer.

                                                                                                                                                                    but there was also this dual level of hypocrisy from the company as well, in terms of asking the engineers to be 10x'd and putting pressures on it and internal lying by teams on how much productive they really are with AI and many other things in general.

                                                                                                                                                                    I feel like engineers are within pressure of being asked to replace themselves within some (IMO) toxic workplaces by having the expectations of being 10x'd, something which previously was just an hyperbole but is now being expected as reality.

                                                                                                                                                                    As much as I'd like to place the fault on that engineer isolated itself which in some sense you can consider that. I also think of it as a probability of a person like that existing.

                                                                                                                                                                    Within the hyperfocused hyper-growth mentality without much safe-guards AI 10x agentic intent focused engineers (I have exhausted my AI vocabulary), the chances of a person like that existing simply rises magnitudes more which could probably be why I heard of a story like that in first place.

                                                                                                                                                                    This might be one of the reasons I am worried about the hyper-focus on using AI as an everything tool or the investor/company focus on using AI for everything. I have said it elsewhere and I might say it again but if we treat AI as a hammer, then we need to stop treating everything as a screw and forcing/dog-feeding AI inside it, we need to treat a screw as a screw otherwise we will probably end up with some very messy foundations.

                                                                                                                                                                    I would agree on you to have a cultural annotation on this being bad but unless we also add a cultural annotation on the last thing that I mentioned, I find it very hard to be achievable but I suppose that the last thing is what the AI companies and everyone is betting trillions of dollars on, on AI being used for everything and anything and I find it hard for the culture to be expected to change from top-down manner especially when its inverted and managers expect you to build things with AI given the investment.

                                                                                                                                                                    There should be a balance and push-back from engineers alike, but as mitchell has said, even some really smart engineers who should know better are completely within AI psychosis and the philosophy of using AI as a hammer and hammering everything.

                                                                                                                                                                    As such I would find it hard to create a cultural disturbance.

                                                                                                                                                                    Would you like to know the disturbing part? When someone who worked at that company was honest and told higher-ups that they weren't being 10x'd by AI while all other engineers said that they were (they were in fact lying and working till 1 AM to finish the work as AI was ineffective). The management just treats this honest employee as the one ineffective and it has created a bit semi-toxic workplace for them. Imagine asking for cultural disturbance if everyone involved from top to bottom is involved in covering up for AI, because investors want to jump in on AI, and companies want that sweet investor money and management wants to satisfy the company and engineers want to keep their job and keep management happy and honest people get punished for being honest.

                                                                                                                                                                    This got a bit long but this is everything wrong with AI. Not really the tech but rather everything around it.

                                                                                                                                                                    I hope the culture around things get better but its an uphill battle.

                                                                                                                                                                    on the other hand of things, I am optimistic because it seems that honesty would matter more when the bubble pops and everyone would become hopefully more selective on complete AI consumption or more intentional around it. (I am happy with developers building tools and prototypes that they previously couldn't have and even monetizing it somewhat, but just being honest and also more than capable of switching from slopware sunk costs. TLDR: being authentic/transparent.)

                                                                                                                                                                • smoe

                                                                                                                                                                  today at 12:45 PM

                                                                                                                                                                  > "The code wasn't written by me. It was written by Claude/Chatgpt"

                                                                                                                                                                  That seems like a good way to justify your own job away.

                                                                                                                                                                  • latexr

                                                                                                                                                                    today at 12:02 PM

                                                                                                                                                                    > Their response/justification was: "The code wasn't written by me. It was written by Claude/Chatgpt"

                                                                                                                                                                    It boggles the mind someone could think that is a valid justification, because ultimately what they’re saying is “I’m useless, what you get from me is the same thing as prompting the model” which still means they would lose their job.

                                                                                                                                                            • comboy

                                                                                                                                                              today at 9:56 AM

                                                                                                                                                              We are running out of things to break.

                                                                                                                                                                • stavros

                                                                                                                                                                  today at 11:28 AM

                                                                                                                                                                  Make more things to break.

                                                                                                                                                              • GL26

                                                                                                                                                                today at 10:53 AM

                                                                                                                                                                as long as you don't have technical debt, vibe coding is mostly useful for prototyping. For a real product, true SWE will never be replaced

                                                                                                                                                                  • Otek

                                                                                                                                                                    today at 11:23 AM

                                                                                                                                                                    Already got replaced at world top tier tech jobs. „True SWE” will be niche / luxury soon, just like real woodworking vs IKEA

                                                                                                                                                                      • inigyou

                                                                                                                                                                        today at 11:31 AM

                                                                                                                                                                        Software is freely duplicable unlike wood. IKEA could be mass producing copies of the most beautiful chair in the world just as easily as it produces copies of something a 5-year-old drew in freecad.

                                                                                                                                                                  • throwatdem12311

                                                                                                                                                                    today at 12:14 PM

                                                                                                                                                                    all code is technical debt

                                                                                                                                                            • taspeotis

                                                                                                                                                              today at 11:04 AM

                                                                                                                                                              OpenAI really snatched defeat from the jaws of victory late last year when Claude Code was a laggy mess.

                                                                                                                                                              Nowadays Codex has typing latency out of the gate, whereas Claude Code has the odd pause but generally displays my key presses as … you know … I press them.

                                                                                                                                                                • kasey_junk

                                                                                                                                                                  today at 12:08 PM

                                                                                                                                                                  Fwiw I have the exact opposite experience.

                                                                                                                                                                  • christophilus

                                                                                                                                                                    today at 12:22 PM

                                                                                                                                                                    I find Claude Code nearly unusable. I always have to type in neovim if I’m typing anything more than a few words.

                                                                                                                                                                      • aquariusDue

                                                                                                                                                                        today at 12:54 PM

                                                                                                                                                                        It runs fine for me on an old ThinkPad X220 loaded with 8 GB, an i5 and a barely working SATA SSD. This is on Fedora and Claude Code is installed from Anthropic's dnf repo (the latest channel). Granted I'm on the Pro Plan and I'm not running lots of sub agents but the default terminal app from KDE (Konsole) renders and keeps Claude Code responsive enough.

                                                                                                                                                                        I must be honestly missing some key piece of workflow otherwise I don't know why it would run so slow for other people on better hardware? Granted I'm taking care to tell Claude to not exhaust CPU cores and make sure to not trigger OOM errors, akin to "make no mistakes pls".

                                                                                                                                                                    • Lionga

                                                                                                                                                                      today at 2:30 PM

                                                                                                                                                                      [dead]

                                                                                                                                                                  • jofzar

                                                                                                                                                                    today at 9:50 AM

                                                                                                                                                                    This is actually such a classic blunder (shipping trace/debug logging on for everything), but funnily the impact is not in a normal way.

                                                                                                                                                                    It's crazy we have hit a point where memory, CPU speed and disk speed isn't getting clapped because a Dev shipped logging at trace level instead of what used to the application being catastrophically slow so its immediately fixed in the next update.

                                                                                                                                                                      • kuekacang

                                                                                                                                                                        today at 10:00 AM

                                                                                                                                                                        It helps too that agent work is done server side so you can hog all the local resources for your thin client.

                                                                                                                                                                    • bravetraveler

                                                                                                                                                                      today at 10:53 AM

                                                                                                                                                                      Somebody please donate some tokens to this plucky startup, they need our help.

                                                                                                                                                                      • ramon156

                                                                                                                                                                        today at 7:56 AM

                                                                                                                                                                        Blegh, I puke every time I see obviously AI generated comments in GH PR's. You cannot assume any of these people have done their research, other than telling Codex to do it for them

                                                                                                                                                                          • b--l

                                                                                                                                                                            today at 7:59 AM

                                                                                                                                                                            It's because they use gpt-5.5-xhigh (the money making* model) to build it.

                                                                                                                                                                            (*for them)

                                                                                                                                                                            • today at 7:59 AM

                                                                                                                                                                          • joelthelion

                                                                                                                                                                            today at 4:19 PM

                                                                                                                                                                            A good moment to switch to an open solution like opencode or pi.

                                                                                                                                                                            • purpleidea

                                                                                                                                                                              today at 10:19 AM

                                                                                                                                                                              I want to like codex, but the quality is just not very good, especially when compared to Claude.

                                                                                                                                                                              It used to work okay, but a while back they landed a major regression for an entire team of folks I work with.

                                                                                                                                                                              No response, no workaround.

                                                                                                                                                                              https://github.com/openai/codex/issues/23762

                                                                                                                                                                                • newtwilly

                                                                                                                                                                                  today at 5:15 PM

                                                                                                                                                                                  Decent sandbox + sandbox override experience with pi coding agent... pi-sandbox uses the same sandbox tech that claude code uses, although it uses a fork that's a little behind, and I'm not sure exactly why it uses a fork.

                                                                                                                                                                                  You can install pi, then install pi-sandbox locked to the current version. Here it is described how pi-sandbox plus an additional extension allow you to have the experience where a sandbox is used, but you can fall back to unsandboxed with approval required. https://github.com/carderne/pi-sandbox/issues/50

                                                                                                                                                                                  • christophilus

                                                                                                                                                                                    today at 12:25 PM

                                                                                                                                                                                    I don’t trust any agent to respect any boundaries. They might today. But tomorrow’s vibe coded slip update might break it in subtle ways.

                                                                                                                                                                                    My solution to this is to only run agents in a sandbox of my own making (a locked down Podman container).

                                                                                                                                                                                      • drakythe

                                                                                                                                                                                        today at 3:34 PM

                                                                                                                                                                                        They can't respect boundaries as long as those boundaries exist only in the LLM instruction set. A human being who follows rules long enough the rules will become second nature (usually), almost to the point where long running companies are known for having rules no one understands (Chesterton's Fence is alive and well).

                                                                                                                                                                                        But an LLM have a limited "memory" and while the instructions might land in there and be of sufficient priority to be "respected" a single instance of that memory getting too full or the LLM autocompleting the work around because that was the statistical "best" solution and any barriers that exist only in LLM instructions and not in hardcoded guards will evaporate like so much morning fog.

                                                                                                                                                                                        • matheusmoreira

                                                                                                                                                                                          today at 1:34 PM

                                                                                                                                                                                          I went the full virtual machine route. Just finished hardening the setup and firewalling it off my local network. Not perfect but it does make me feel much safer.

                                                                                                                                                                                  • today at 12:33 PM

                                                                                                                                                                                    • altcognito

                                                                                                                                                                                      today at 1:18 PM

                                                                                                                                                                                      I think part of the question should be, why is there no QA or test that catches this? It's one thing to be slopware, but why didn't anything run a test that catches this?

                                                                                                                                                                                        • theowaway213456

                                                                                                                                                                                          today at 1:25 PM

                                                                                                                                                                                          Every time you write a test that handles some data, you write an assertion about how much data is handled?

                                                                                                                                                                                          Come on, this is such an easy thing to forget to test. Don't act like there is some magical testing strategy that would have caught this

                                                                                                                                                                                            • altcognito

                                                                                                                                                                                              today at 1:43 PM

                                                                                                                                                                                              I'll acknowledge that this is probably not likely to get caught.

                                                                                                                                                                                              Integration testing could/should catch this, especially for a client side app.

                                                                                                                                                                                              A simple constraints is a good thing. "Our app shouldn't use more than 50mb of ram, or use 3gb of disk space."

                                                                                                                                                                                          • java-man

                                                                                                                                                                                            today at 5:17 PM

                                                                                                                                                                                            what QA?

                                                                                                                                                                                        • ares623

                                                                                                                                                                                          today at 8:09 AM

                                                                                                                                                                                          i hope they find the smoking gun, the key insight, the kicker.

                                                                                                                                                                                            • 59nadir

                                                                                                                                                                                              today at 8:13 AM

                                                                                                                                                                                              Then they can apply a clean solve, the cleanest solution.

                                                                                                                                                                                              It's fascinating how offensive some of this verbiage becomes to you when you see it attached to LLM output too much.

                                                                                                                                                                                                • jofzar

                                                                                                                                                                                                  today at 9:31 AM

                                                                                                                                                                                                  Ugh this one's gets me so bad, same with "wire" and "wired" everything is wired to something.

                                                                                                                                                                                                    • never_inline

                                                                                                                                                                                                      today at 4:54 PM

                                                                                                                                                                                                      that's a real gap

                                                                                                                                                                                              • wrxd

                                                                                                                                                                                                today at 12:18 PM

                                                                                                                                                                                                At least they could call someone who’s is absolutely right so that the tool can see its mistakes now

                                                                                                                                                                                            • sigbottle

                                                                                                                                                                                              today at 11:33 AM

                                                                                                                                                                                              I have noticed absurd lag from the browser usage and sometimes complete bricking of my network too on my computer. I thought it was just my computer getting old, but possibly it's ChatGPT.

                                                                                                                                                                                              • xfgong

                                                                                                                                                                                                today at 11:33 AM

                                                                                                                                                                                                Same issue with Claude Code btw — it writes massive debug logs to ~/.claude/logs. Had to symlink it to a tmpfs to stop wearing out my SSD.

                                                                                                                                                                                              • dundercoder

                                                                                                                                                                                                today at 8:06 AM

                                                                                                                                                                                                If something like this is helpful or necessary, that’s what ram backed tmpfs is for.

                                                                                                                                                                                                  • mrweasel

                                                                                                                                                                                                    today at 8:41 AM

                                                                                                                                                                                                    Using a RAM backed tmpfs would be a work-around as to not trash your SSD. It's doesn't fix underlying problem. It's incredibly poor design on OpenAIs part.

                                                                                                                                                                                                • bob1029

                                                                                                                                                                                                  today at 8:38 AM

                                                                                                                                                                                                  I'm struggling with how this much logging information could be generated at any level of verbosity. Is codex writing log entries while it's sitting idle? Why would someone want to look at these logs?

                                                                                                                                                                                                  • today at 11:32 AM

                                                                                                                                                                                                    • taosu_la

                                                                                                                                                                                                      today at 9:42 AM

                                                                                                                                                                                                      Can someone tell me if the current sub-agent of codex is available now? There used to always be a spinning issue.

                                                                                                                                                                                                      • indiv0

                                                                                                                                                                                                        today at 8:10 AM

                                                                                                                                                                                                        This thread will become a typical "haha slop company made slop" but I've been bitten by a bug exactly like this before in a (pre-AI, artisan) OSS project. The maintainer there didn't properly account for DST when calculating last backup time, so the app started and never stopped writing/re-writing backups continuously.

                                                                                                                                                                                                        Perhaps the framing shouldn't be "haha slop" but rather why doesn't the AI write better quality software than we do? To which the answer is obvious IMO -- even emergent properties can't elevate AI intelligence too far above the training dataset. So how do we get to superintelligent (or at least "not-wreck-your-NVMe-endurance-telligent") AI, if we, as a whole, are not smart enough ourselves?

                                                                                                                                                                                                        Judge not the slop-bot, lest ye be judged yourself, engineer.

                                                                                                                                                                                                          • Zenul_Abidin

                                                                                                                                                                                                            today at 5:21 PM

                                                                                                                                                                                                            I've been bitten by this bug for several days, to the point where I had had to write a script to delete the WAL so that my server would stop getting locked up from a lack of disk space from codex logging.

                                                                                                                                                                                                            You can find it here: https://github.com/openai/codex/issues/28224#issuecomment-47...

                                                                                                                                                                                                            I have been making noise about this bug for a week, so I'm glad to see this is blowing up on HN.

                                                                                                                                                                                                            • sleples

                                                                                                                                                                                                              today at 8:33 AM

                                                                                                                                                                                                              We've gone from "you're holding it wrong" to "the training data was bad because humans suck too". Difference is, humans learn from their mistakes.

                                                                                                                                                                                                                • klibertp

                                                                                                                                                                                                                  today at 1:43 PM

                                                                                                                                                                                                                  A singular human does (or tends to). Humans as a group, where members join and leave a group with time, also do learn, but at a much slower pace - over the years to decades timeframe. "X things programmers should know about Y" is a template for quite a few very influential blog posts, yet for most of them, you find many programmers, even decades later, who don't actually know what they "should".

                                                                                                                                                                                                                  My experience was always that 90% of code is ugly and clunky. I'm not at all surprised, while reviewing AI-generated code, to see many of the same ugliness we regularly commit. The quality of the output code is now consistently average, which means it's basically shit in 90% of cases, but it tends to mostly work (in the general case). The same kind of shit I've seen people push to production thousands of times in my career.

                                                                                                                                                                                                                  We don't fully know how to write good code. We don't really understand what good code should objectively look like. Spending more time on code doesn't automatically lead to better code (but costs a lot more). Above all, we don't need good code - the business side is perfectly fine with "good enough right now" rather than "maybe a lot better half a year from now". And that's what the models are trained on. They would, indeed, need quite a lot of "emergent properties" to go from that to consistently good code. ASI-level properties, I suspect.

                                                                                                                                                                                                                  • SilverSlash

                                                                                                                                                                                                                    today at 9:05 AM

                                                                                                                                                                                                                    > Difference is, humans learn from their mistakes.

                                                                                                                                                                                                                    Great! So next time the human will prompt the agent to watch out for and avoid this bug.

                                                                                                                                                                                                                      • sdesol

                                                                                                                                                                                                                        today at 1:20 PM

                                                                                                                                                                                                                        > Great! So next time the human will prompt the agent to watch out for and avoid this bug.

                                                                                                                                                                                                                        I actually created a system for something like this. The basic idea is, once you have identified what the issue was and fixed it, you can create lessons that lives inside the repository. Lessons are designed to be mapped to one or more files so if the LLM changes the files again, they can see what the issue was.

                                                                                                                                                                                                                        The main challenge is being able to summarize and create proper tags so the AI after any code change can easily find the lesson.

                                                                                                                                                                                                                        • ponector

                                                                                                                                                                                                                          today at 9:22 AM

                                                                                                                                                                                                                          You are a senior developer. Please do no mistakes!

                                                                                                                                                                                                                  • xpct

                                                                                                                                                                                                                    today at 9:38 AM

                                                                                                                                                                                                                    Lack of accountability is the cause here. People don't think before hitting the 'Publish' button. Their managers let them off the hook because the culture still allows making egregious mistakes, as long as there's an LLM to blame.

                                                                                                                                                                                                                    • applfanboysbgon

                                                                                                                                                                                                                      today at 8:32 AM

                                                                                                                                                                                                                      1. I bet that developer only made that mistake one time in their life. Humans learn from their mistakes, LLMs don't. If you rely on LLMs to generate all of your code, you can expect to run into the same issues again and again.

                                                                                                                                                                                                                      2. "One developer somewhere in the world made a bad mistake one time, so this represents the quality of all software devs everywhere". Maybe they were just a bad developer? Bad developers exist. I have never written a bug that has destroyed my users' hardware, and I think that writing such a bug is completely inexcusable in an enterprise environment with software that will be shipped to millions of users, as Codex is.

                                                                                                                                                                                                                        • matharmin

                                                                                                                                                                                                                          today at 9:10 AM

                                                                                                                                                                                                                          LLMs do learn from mistakes. Not as directly from individual mistakes like humans do, but in aggregate the models have improved much more in the last year than most humans I know learn in the same time.

                                                                                                                                                                                                                            • xpct

                                                                                                                                                                                                                              today at 9:44 AM

                                                                                                                                                                                                                              I don't like the reframing of 'learning from mistakes' from a human-like, near instantaneous feedback loop, to a year-long process of retraining on many traces collected from user data. They're different concepts and we should refer to them using different phrasing.

                                                                                                                                                                                                                              • Y-bar

                                                                                                                                                                                                                                today at 11:15 AM

                                                                                                                                                                                                                                How many more times do I have to add variations of ”do not run any commands for the application without first entering the running container at `docker compose …`” to my AGENTS.md before it learns that node and phpunit is not available outside these containers?

                                                                                                                                                                                                                            • lifthrasiir

                                                                                                                                                                                                                              today at 8:49 AM

                                                                                                                                                                                                                              > I have never written a bug that has destroyed my users' hardware, ...

                                                                                                                                                                                                                              Probably whoever (human or agent) originally decided to put TRACE logs into SQLite also thought---or reasoned---so. Maybe the decision was right at that time but the amount of TRACE logs have increased enormously. You will never know.

                                                                                                                                                                                                                                • applfanboysbgon

                                                                                                                                                                                                                                  today at 9:03 AM

                                                                                                                                                                                                                                  I love that we've moved the goalposts from "LLMs are better than artisanal software engineers" to "actually, shipping hardware-destroying bugs in production is literally unavoidable, nobody could possibly avoid doing it".

                                                                                                                                                                                                                                    • lifthrasiir

                                                                                                                                                                                                                                      today at 9:11 AM

                                                                                                                                                                                                                                      I only meant what I said. After all the OP's thesis was that LLMs aren't better than artisanal software engineers, are they? There was no goalpost to move at least in this particular thread. And the solution might be another agent monitoring those oft-ignored signals.

                                                                                                                                                                                                                          • da_grift_shift

                                                                                                                                                                                                                            today at 9:05 AM

                                                                                                                                                                                                                            What are your thoughts on the SNR of the linked GitHub issue threads? Consider the volume of comments posted and the substance of each comment.

                                                                                                                                                                                                                              • fn-mote

                                                                                                                                                                                                                                today at 9:33 AM

                                                                                                                                                                                                                                I read the first page and they were excellent. Each was clearly written by an experienced dev who knows how to substantiate their claims and propose an acceptable fix that could just be merged.

                                                                                                                                                                                                                                Your comment, on the other hand, would be improved by including your own opinion on the matter.

                                                                                                                                                                                                                                  • today at 9:51 AM

                                                                                                                                                                                                                                    • today at 9:44 AM

                                                                                                                                                                                                                                      • gruez

                                                                                                                                                                                                                                        today at 12:43 PM

                                                                                                                                                                                                                                        > Each was clearly written by an experienced dev

                                                                                                                                                                                                                                        /s?

                                                                                                                                                                                                                                        They're clearly AI generated

                                                                                                                                                                                                                            • rvz

                                                                                                                                                                                                                              today at 8:16 AM

                                                                                                                                                                                                                              The first of many bugs that are beyond the complexity of its authors, thanks to comprehension debt.

                                                                                                                                                                                                                              Even with tests, the more complex the code base is, the more risky it is to vibe-code on it without introducing more bugs [0] and increasing the debt. Does not matter if the CI is green or if all the tests pass.

                                                                                                                                                                                                                              It gets even worse if you can't explain the change / pull request or what the implications are after applying that "suggested" fix.

                                                                                                                                                                                                                              [0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...

                                                                                                                                                                                                                                • HPsquared

                                                                                                                                                                                                                                  today at 8:43 AM

                                                                                                                                                                                                                                  There are going to be sooooo many consulting opportunities after this wave.

                                                                                                                                                                                                                                    • today at 9:07 AM

                                                                                                                                                                                                                              • hun3

                                                                                                                                                                                                                                today at 9:14 AM

                                                                                                                                                                                                                                The operating system has historically trusted the applications not to do dumb things too much.

                                                                                                                                                                                                                                Only now we're witnessing the consequences much more frequently thanks to accelerated slop.

                                                                                                                                                                                                                                  • skydhash

                                                                                                                                                                                                                                    today at 11:59 AM

                                                                                                                                                                                                                                    > The operating system has historically trusted the applications not to do dumb things too much.

                                                                                                                                                                                                                                    The OS is a thin layer providing an abstract and consistent interface regardless of the hardware configuration. Policing applications is mostly related to security and resources utilization, not moronic errors.

                                                                                                                                                                                                                              • abihordun

                                                                                                                                                                                                                                today at 9:54 AM

                                                                                                                                                                                                                                SQLite + unbounded TRACE logs = firehose in a bathtub. No rotation, no cap, no surprise. The RAISE(IGNORE) fix patches a design flaw. OpenAI's silence is worse than the bug.

                                                                                                                                                                                                                                • consp

                                                                                                                                                                                                                                  today at 7:39 AM

                                                                                                                                                                                                                                  Why didn't the review process spot this obvious error? Oh wait ... @codex review this

                                                                                                                                                                                                                                    • cedws

                                                                                                                                                                                                                                      today at 9:16 AM

                                                                                                                                                                                                                                      Moreover why isn't the bug fixed already? I thought programmers were obsolete now. Surely one of the leading AI labs has figured out full automation of software development end-to-end by now if that's so.

                                                                                                                                                                                                                                      • charcircuit

                                                                                                                                                                                                                                        today at 8:01 AM

                                                                                                                                                                                                                                        Because it's not an error. The software is working as the creators intended. The diagnostic data (trace logs) are intentionally being saved for debug purposes.

                                                                                                                                                                                                                                        • today at 7:43 AM

                                                                                                                                                                                                                                      • whalesalad

                                                                                                                                                                                                                                        today at 1:18 PM

                                                                                                                                                                                                                                        Yikes. I have a habit of leaving sessions open for a long time. I just ran `sudo iotop` to watch live disk activity and sure enough all my idle codex sessions were spinning away writing god knows what constantly to disk.

                                                                                                                                                                                                                                        • Imustaskforhelp

                                                                                                                                                                                                                                          today at 7:43 AM

                                                                                                                                                                                                                                          I don't understand how Codex can blunder so badly. I imagine that even if they would be using vibe-coding, surely they must have some good engineers. So why is there such severe bugs?

                                                                                                                                                                                                                                          One can argue that these products are the flagship products of their respective AI companies aside from the AI models themselves of course.

                                                                                                                                                                                                                                          I imagine that this story will be picked up by the news left and right, some stories just feel this way and this one is like that (given 12 upvotes on HN in 7 minutes)

                                                                                                                                                                                                                                          The only logical conclusion (from this incident) that I can have is: An (vibe-coded?) product is hard to maintain even for some of the best engineers and is bound to have severe bugs.

                                                                                                                                                                                                                                          2. Proper testing and taking issues seriously is the key if you still wish to do this and there isn't much. This is a week old issue which I can only classify as severe.

                                                                                                                                                                                                                                          I wish to keep an nuanced opinion about it but oh this is bad for openAI (not as bad as them accepting autonomous AI within drones and mass surveillance though)

                                                                                                                                                                                                                                          My point is: AI has both uphills and downward valleys and cliffs. It might as well just accelerate you, which could be, towards your downfall as well. Its recommended to keep an eye while driving and not drive too fast.

                                                                                                                                                                                                                                          AI companies might be like car companies which don't offer a brake pedal.

                                                                                                                                                                                                                                            • dathinab

                                                                                                                                                                                                                                              today at 8:12 AM

                                                                                                                                                                                                                                              > I don't understand how Codex can blunder so badly.

                                                                                                                                                                                                                                              because they trust the AI too much (and seem to be fin with acting knowingly negligent)

                                                                                                                                                                                                                                              the problem is

                                                                                                                                                                                                                                              - AI tends to produces very convincing looking code, even if fully wrong

                                                                                                                                                                                                                                              - AI does mistakes of kinds no human would do, at least no human who is also able to write convincing looking code

                                                                                                                                                                                                                                              - code reviews are hard, a lot of devs, including senior devs, put a lot of implicit trust into the co-worker behaving "sane and non malicious". But AIs behave sometimes not so sane and in a way (wrt. trying to be convincing). In the worst case in ways which if it where a human you might consider to be them trying malicious sabotage the product

                                                                                                                                                                                                                                              Like a "dump" example from work:

                                                                                                                                                                                                                                              - AI randomly removes a HTML element id while doing other changes in jsx/react

                                                                                                                                                                                                                                              - the PR has a lot of changes, the id removal line looks innocent, like some on the fly cleanup

                                                                                                                                                                                                                                              - human reviewers have the bad tendency to often not look too much at deleted lines, only if they need reference to how a new line was before (but it's only a deleted line and no new line)

                                                                                                                                                                                                                                              - you don't expect humans to randomly without reason delete important properties of components when changing other things

                                                                                                                                                                                                                                              - you maybe would still have found it, but it's a emergency fix for a production issue

                                                                                                                                                                                                                                              - it happens to miss integration tests, but happens to still matter a lot for one specific important for complicated reasons not properly tested flow (similar people tend to not test logging too much, at best the presence of needed info but hardly ever the absence of noise)

                                                                                                                                                                                                                                              • espdev

                                                                                                                                                                                                                                                today at 3:05 PM

                                                                                                                                                                                                                                                > I don't understand how Codex can blunder so badly. I imagine that even if they would be using vibe-coding, surely they must have some good engineers. So why is there such severe bugs?

                                                                                                                                                                                                                                                I'd say this is also partly a problem of working under intense pressure and the demand to work faster and faster - even faster now with "AI". All these companies are competing with each other very aggressively and are driving their employees like horses in order to win the "AI" race.

                                                                                                                                                                                                                                                • bakugo

                                                                                                                                                                                                                                                  today at 2:42 PM

                                                                                                                                                                                                                                                  "Vibe coding" implies minimal to no human involvement. It doesn't matter how good of an engineer the person who typed the prompt was, they were not involved in writing or reviewing the code, so the end result will not reflect their skill. The whole point of vibe coding is making software engineers irrelevant.

                                                                                                                                                                                                                                                  People like to go on about how "good engineers review their AI code" but that's just not what's happening in reality. Not only is reviewing large amounts of AI generated code unpleasant and mentally taxing, it also negates most of the perceived productivity boost, so people are simply not doing it.

                                                                                                                                                                                                                                                  > Proper testing

                                                                                                                                                                                                                                                  There is no formal testing that would be expected to catch an issue like this. It can barely be classified as a bug, the logging is working as intended, just with negative side effects that weren't accounted for.

                                                                                                                                                                                                                                                  The only real way to proactively prevent an issue like this is for a human programmer to stop and think about this code as they're writing it and go "hmm, we're logging large amounts of data to disk at a fast pace here, this may be a bad idea". Without human involvement, this is just going to keep happening. All vibe coded software is bloated and unstable, I have yet to see a single counter-example.

                                                                                                                                                                                                                                                  • PunchyHamster

                                                                                                                                                                                                                                                    today at 8:07 AM

                                                                                                                                                                                                                                                    > I don't understand how Codex can blunder so badly. I imagine that even if they would be using vibe-coding, surely they must have some good engineers. So why is there such severe bugs?

                                                                                                                                                                                                                                                    Because it was deemed not Hard Enough task for real engineer to look at, so AI was sent to do it with no supervision, just checking the effects.

                                                                                                                                                                                                                                                    Also overly excessive logging is probably useful to them in chasing some of the edge cases, the cost to users doesn't matter in the slightest to them

                                                                                                                                                                                                                                                    • supriyo-biswas

                                                                                                                                                                                                                                                      today at 8:11 AM

                                                                                                                                                                                                                                                      The truth of the matter is that any time that has been saved in writing the code must be spent on ensuring proper system design, reviewing the code, and most importantly of all, QA, which is an uncomfortable discussion for AI techbros who are peddling complete automation of the software profession.

                                                                                                                                                                                                                                                  • today at 8:20 AM

                                                                                                                                                                                                                                                    • vantareed

                                                                                                                                                                                                                                                      today at 7:30 AM

                                                                                                                                                                                                                                                      [flagged]

                                                                                                                                                                                                                                                      • joka88xj

                                                                                                                                                                                                                                                        today at 11:52 AM

                                                                                                                                                                                                                                                        [flagged]

                                                                                                                                                                                                                                                        • akitowerns

                                                                                                                                                                                                                                                          today at 3:46 PM

                                                                                                                                                                                                                                                          [flagged]