\

We tasked Opus 4.6 using agent teams to build a C Compiler

265 points - today at 7:07 PM

Source
  • ndesaulniers

    today at 9:41 PM

    I spent a good part of my career (nearly a decade) at Google working on getting Clang to build the linux kernel. https://clangbuiltlinux.github.io/

    This LLM did it in (checks notes):

    > Over nearly 2,000 Claude Code sessions and $20,000 in API costs

    It may build, but does it boot (was also a significant and distinct next milestone)? (Also, will it blend?). Looks like yes!

    > The 100,000-line compiler can build a bootable Linux 6.9 on x86, ARM, and RISC-V.

    The next milestone is:

    Is the generated code correct? The jury is still out on that one for production compilers. And then you have performance of generated code.

    > The generated code is not very efficient. Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled.

    Still a really cool project!

      • beambot

        today at 10:15 PM

        This is getting close to a Ken Thompson "Trusting Trust" era -- AI could soon embed itself into the compilers themselves.

          • bopbopbop7

            today at 10:19 PM

            A pay to use non-deterministic compiler. Sounds amazing, you should start.

              • Aurornis

                today at 10:36 PM

                Application-specific AI models can be much smaller and faster than the general purpose, do-everything LLM models. This allows them to run locally.

                They can also be made to be deterministic. Some extra care is required to avoid computation paths that lead to numerical differences on different machines, but this can be accomplished reliably with small models that use integer math and use kernels that follow a specific order of operations. You get a lot more freedom to do these things on the small, application-specific models than you do when you're trying to run a big LLM across different GPU implementations in floating point.

                • ndesaulniers

                  today at 10:26 PM

                  Some people care more about compile times than the performance of generated code. Perhaps even the correctness of generated code. Perhaps more so than determinism of the generated code. Different people in different contexts can have different priorities. Trying to make everyone happy can sometimes lead to making no one happy. Thus dichotomies like `-O2` vs `-Os`.

                  EDIT (since HN is preventing me from responding):

                  > Some people care more about compiler speed than the correctness?

                  Yeah, I think plenty of people writing code in languages that have concepts like Undefined Behavior technically don't really care as much about correctness as they may claim otherwise, as it's pretty hard to write large volumes of code without indirectly relying on UB somewhere. What is correct in such case was left up to interpretation of the implementer by ISO WG14.

                    • bopbopbop7

                      today at 10:33 PM

                      Some people care more about compiler speed than the correctness? I would love to meet these imaginary people that are fine with a compiler that is straight up broken. Emitting working code is the baseline, not some preference slider.

                      • chasd00

                        today at 10:36 PM

                        a compiler introducing bugs into code it compiles is a nightmare thankfully few have faced. The only thing worse would be a CPU bug like the legendary Pentium bug. Imagine you compile something like Postgres only to have it crash in some unpredictable way. How long do you stare at Postgres source before suspecting the compiler? What if this compiler was used to compile code in software running all over cloud stacks? Bugs in compilers are very bad news, they have to be correct.

                • ndesaulniers

                  today at 10:31 PM

                  We're already starting to see people experimenting with applying AI towards register allocation and inlining heuristics. I think that many fields within a compiler are still ripe for experimentation.

                  https://llvm.org/docs/MLGO.html

              • shakna

                today at 9:59 PM

                > Opus was unable to implement a 16-bit x86 code generator needed to boot into 16-bit real mode. While the compiler can output correct 16-bit x86 via the 66/67 opcode prefixes, the resulting compiled output is over 60kb, far exceeding the 32k code limit enforced by Linux. Instead, Claude simply cheats here and calls out to GCC for this phase

                Does it really boot...?

                  • ndesaulniers

                    today at 10:11 PM

                    > Does it really boot...?

                    They don't need 16b x86 support for the RISCV or ARM ports, so yes, but depends on what 'it' we're talking about here.

                    Also, FWIW, GCC doesn't directly assemble to machine code either; it shells out to GAS (GNU Assembler). This blog post calls it "GCC assembler and linker" but to be more precise the author should edit this to "GNU binutils assembler and linker." Even then GNU binutils contains two linkers (BFD and GOLD), or did they excise GOLD already (IIRC, there was some discussion a few years ago about it)?

                      • shakna

                        today at 10:34 PM

                        Yeah, didn't mention gas or ld, for similar reasons. I agree that a compiler doesn't necessarily "need" those.

                        I don't agree that all the claims are backed up by their own comments, which means that there's probably other places where it falls down.

                        Its... Misrepresentation.

                        Like Chicken is a Scheme compiler. But they're very up front that it depends on a C compiler.

                        Here, they wrote a C compiler that is at least sometimes reliant on having a different C compiler around. So is the project at 50%? 75%?

                        Even if its 99%, thats not the same story as they tried to write. And if they wrote that tale instead, it would be more impressive, rather than "There's some holes. How many?"

                          • Philpax

                            today at 10:51 PM

                            Their C compiler is not reliant on having another C compiler around. Compiling the 16-bit real mode bootstrap for the Linux kernel on x86(-64) requires another C compiler; you certainly don't need another compiler to compile the kernel for another architecture, or to compile another piece of software not subject to the 32k constraint.

                            The compiler itself is entirely functional; it just can't generate code optimal enough to fit within the constraints for that very specific (tiny!) part of the system, so another compiler is required to do that step.

                • zaphirplane

                  today at 10:02 PM

                  What were the challenges out of interest. Some of it is the use of gcc extensions? Which needed an equivalent and porting over to the equivalent

                    • ndesaulniers

                      today at 10:17 PM

                      `asm goto` was the big one. The x86_64 maintainers broke the clang builds very intentionally just after we had gotten x86_64 building (with necessary patches upstreamed) by requiring compiler support for that GNU C extension. This was right around the time of meltdown+spectre, and the x86_64 maintainers didn't want to support fallbacks for older versions of GCC (and ToT Clang at the time) that lacked `asm goto` support. `asm goto` requires plumbing throughout the compiler, and I've learned more about register allocation than I particularly care...

                      Fixing some UB in the kernel sources, lots of plumbing to the build system (particularly making it more hermetic).

                      Getting the rest of the LLVM binutils substitutes to work in place of GNU binutils was also challenging. Rewriting a fair amount of 32b ARM assembler to be "unified syntax" in the kernel. Linker bugs are hard to debug. Kernel boot failures are hard to debug (thank god for QEMU+gdb protocol). Lots of people worked on many different parts here, not just me.

                      https://github.com/ClangBuiltLinux/linux/issues for a good historical perspective. https://github.com/ClangBuiltLinux/linux/wiki/Talks,-Present... for talks on the subject. Keynoting LLVM conf was a personal highlight (https://www.youtube.com/watch?v=6l4DtR5exwo).

                  • phillmv

                    today at 9:56 PM

                    i mean… your work also went into the training set, so it's not entirely surprising that it spat a version back out!

                      • underdeserver

                        today at 9:59 PM

                        Anthropic's version is in Rust though, so at least a little different.

                          • ndesaulniers

                            today at 10:22 PM

                            There's parts of LLVM architecture that are long in the tooth (IMO) (as is the language it's implemented in, IMO).

                            I had hoped one day to re-implement parts of LLVM itself in Rust; in particular, I've been various curious if we can concurrently compile C (and parse C in parallel, or lazily) that haven't been explored in LLVM, and I think might be safer to do in Rust. I don't know enough about grammers to know if it's technically impossible, but a healthy dose of ignorance can sometimes lead to breakthroughs.

                            LLVM is pretty well designed for test. I was able to implement a lexer for C in Rust that could lex the Linux kernel, and use clang to cross check my implementation (I would compare my interpretation of the token stream against clang's). Just having a standard module system makes having reusable pieces seems like perhaps a better way to compose a toolchain, but maybe folks with more experience with rustc have scars to disagree?

                            • rwmj

                              today at 10:06 PM

                              It's not really important in latent space / conceptually.

                          • GaggiX

                            today at 9:59 PM

                            Clang is not written in Rust tho

                              • underdeserver

                                today at 10:00 PM

                                jinx

                    • lubujackson

                      today at 10:48 PM

                      This is very much a "vibe coding can build you the Great Pyramids but it can't build a cathedral" situation, as described earlier today: https://news.ycombinator.com/item?id=46898223

                      I know this is an impressive accomplishment and is meant to show us the future potential, but it achieves big results by throwing an insane amount of compute at the problem, brute forcing its way to functionality. $20,000 set on fire, at Claude's discounted Max pricing no less.

                      Linear results from exponential compute is not nothing, but this certain feels like a dead end approach. The frontier should be more complexity for less compute, not more complexity from an insane amount more compute.

                      • NitpickLawyer

                        today at 7:28 PM

                        This is a much more reasonable take than the cursor-browser thing. A few things that make it pretty impressive:

                        > This was a clean-room implementation (Claude did not have internet access at any point during its development); it depends only on the Rust standard library. The 100,000-line compiler can build Linux 6.9 on x86, ARM, and RISC-V. It can also compile QEMU, FFmpeg, SQlite, postgres, redis

                        > I started by drafting what I wanted: a from-scratch optimizing compiler with no dependencies, GCC-compatible, able to compile the Linux kernel, and designed to support multiple backends. While I specified some aspects of the design (e.g., that it should have an SSA IR to enable multiple optimization passes) I did not go into any detail on how to do so.

                        > Previous Opus 4 models were barely capable of producing a functional compiler. Opus 4.5 was the first to cross a threshold that allowed it to produce a functional compiler which could pass large test suites, but it was still incapable of compiling any real large projects.

                        And the very open points about limitations (and hacks, as cc loves hacks):

                        > It lacks the 16-bit x86 compiler that is necessary to boot [...] Opus was unable to implement a 16-bit x86 code generator needed to boot into 16-bit real mode. While the compiler can output correct 16-bit x86 via the 66/67 opcode prefixes, the resulting compiled output is over 60kb, far exceeding the 32k code limit enforced by Linux. Instead, Claude simply cheats here and calls out to GCC for this phase

                        > It does not have its own assembler and linker;

                        > Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled.

                        Ending with a very down to earth take:

                        > The resulting compiler has nearly reached the limits of Opus’s abilities. I tried (hard!) to fix several of the above limitations but wasn’t fully successful. New features and bugfixes frequently broke existing functionality.

                        All in all, I'd say it's a cool little experiment, impressive even with the limitations, and a good test-case as the author says "The resulting compiler has nearly reached the limits of Opus’s abilities". Yeah, that's fair, but still highly imrpessive IMO.

                          • geraneum

                            today at 7:39 PM

                            > This was a clean-room implementation

                            This is really pushing it, considering it’s trained on… internet, with all available c compilers. The work is already impressive enough, no need for such misleading statements.

                              • raincole

                                today at 9:38 PM

                                It's not a clean-room implementation, but not because it's trained on the internet.

                                It's not a clean-room implementation because of this:

                                > The fix was to use GCC as an online known-good compiler oracle to compare against

                                  • array_key_first

                                    today at 10:54 PM

                                    If you read the entire GCC source code and then create a compatible compiler, it's not clean room. Which Opus basically did since, I'm assuming, its training set contained the entire source of GCC. So even if they were actively referencing GCC I think that counts.

                                    • Calavar

                                      today at 10:38 PM

                                      By the classical definition of a clean room implementation, it's something that's made by looking at the output but not at the source.

                                      I agree that having a reference compiler available is a huge caveat though. Their developing against a programmatic checker for a spec that's already had millions of man hours put into it. This is an optimal scenario for agentic coding, but the vast majorty of problems that people are going to want to tackle with agentic coding are not going to look like that

                                      • today at 10:15 PM

                                    • today at 8:07 PM

                                      • antirez

                                        today at 8:05 PM

                                        The LLM does not contain a verbatim copy of whatever it saw during the pre-training stage, it may remember certain over-represented parts, otherwise it has a knowledge about a lot of things but such knowledge, while about a huge amount of topics, is similar to the way you could remember things you know very well. And, indeed, if you give it access to internet or the source code of GCC and other compilers, it will implement such a project N times faster.

                                          • halxc

                                            today at 8:14 PM

                                            We all saw verbatim copies in the early LLMs. They "fixed" it by implementing filters that trigger rewrites on blatant copyright infringement.

                                            It is a research topic for heaven's sake:

                                            https://arxiv.org/abs/2504.16046

                                              • RyanCavanaugh

                                                today at 8:18 PM

                                                The internet is hundreds of billions of terabytes; a frontier model is maybe half a terabyte. While they are certainly capable of doing some verbatim recitations, this isn't just a matter of teasing out the compressed C compiler written in Rust that's already on the internet (where?) and stored inside the model.

                                                  • philipportner

                                                    today at 9:46 PM

                                                    This seems related, it may not be a codebase but they are able to extract "near" verbatim books out of Claude Sonnet.

                                                    https://arxiv.org/pdf/2601.02671

                                                    > For Claude 3.7 Sonnet, we were able to extract four whole books near-verbatim, including two books under copyright in the U.S.: Harry Potter and the Sorcerer’s Stone and 1984 (Section 4).

                                                      • Aurornis

                                                        today at 10:45 PM

                                                        Their technique really stretched the definition of extracting text from the LLM.

                                                        They used a lot of different techniques to prompt with actual text from the book, then asked the LLM to continue the sentences. I only skimmed the paper but it looks like there was a lot of iteration and repetitive trials. If the LLM successfully guessed words that followed their seed, they counted that as "extraction". They had to put in a lot of the actual text to get any words back out, though. The LLM was following the style and clues in the text.

                                                        You can't literally get an LLM to give you books verbatim. These techniques always involve a lot of prompting and continuation games.

                                                    • seba_dos1

                                                      today at 10:32 PM

                                                      > The internet is hundreds of billions of terabytes; a frontier model is maybe half a terabyte.

                                                      The lesson here is that the Internet compresses pretty well.

                                                      • mft_

                                                        today at 10:15 PM

                                                        (I'm not needlessly nitpicking, as I think it matters for this discussion)

                                                        A frontier model (e.g. latest Gemini, Gpt) is likely several-to-many times larger than 500GB. Even Deepseek v3 was around 700GB.

                                                        But your overall point still stands, regardless.

                                                    • Aurornis

                                                      today at 10:39 PM

                                                      Simple logic will demonstrate that you can't fit every document in the training set into the parameters of an LLM.

                                                      Citing a random arXiv paper from 2025 doesn't mean "they" used this technique. It was someone's paper that they uploaded to arXiv, which anyone can do.

                                                      • ben_w

                                                        today at 8:24 PM

                                                        We saw partial copies of large or rare documents, and full copies of smaller widely-reproduced documents, not full copies of everything. An e.g. 1 trillion parameter model is not a lossless copy of a ten-petabyte slice of plain text from the internet.

                                                        The distinction may not have mattered for copyright laws if things had gone down differently, but the gap between "blurry JPEG of the internet" and "learned stuff" is more obviously important when it comes to e.g. "can it make a working compiler?"

                                                          • tza54j

                                                            today at 9:09 PM

                                                            We are here in a clean room implementation thread, and verbatim copies of entire works are irrelevant to that topic.

                                                            It is enough to have read even parts of a work for something to be considered a derivative.

                                                            I would also argue that language models who need gargantuan amounts of training material in order to work by definition can only output derivative works.

                                                            It does not help that certain people in this thread (not you) edit their comments to backpedal and make the followup comments look illogical, but that is in line with their sleazy post-LLM behavior.

                                                              • ben_w

                                                                today at 9:35 PM

                                                                > It is enough to have read even parts of a work for something to be considered a derivative.

                                                                For IP rights, I'll buy that. Not as important when the question is capabilities.

                                                                > I would also argue that language models who need gargantuan amounts of training material in order to work by definition can only output derivative works.

                                                                For similar reasons, I'm not going to argue against anyone saying that all machine learning today, doesn't count as "intelligent":

                                                                It is perfectly reasonable to define "intelligence" to be the inverse of how many examples are needed.

                                                                ML partially makes up for being (by this definition) thick as an algal bloom, by being stupid so fast it actually can read the whole internet.

                                                            • antirez

                                                              today at 8:44 PM

                                                              Besides, the fact an LLM may recall parts of certain documents, like I can recall incipits of certain novels, does not mean that when you ask LLM of doing other kind of work, that is not recalling stuff, the LLM will mix such things verbatim. The LLM knows what it is doing in a variety of contexts, and uses the knowledge to produce stuff. The fact that for many people LLMs being able to do things that replace humans is bitter does not mean (and is not true) that this happens mainly using memorization. What coding agents can do today have zero explanation with memorization of verbatim stuff. So it's not a matter of copyright. Certain folks are fighting the wrong battle.

                                                                • shakna

                                                                  today at 10:12 PM

                                                                  During a "clean room" implementation, the implementor is generally selected for not being familiar with the workings of what they're implementing, and banned from researching using it.

                                                                  Because it _has_ been enough, that if you can recall things, that your implementation ends up not being "clean room", and trashed by the lawyers who get involved.

                                                                  I mean... It's in the name.

                                                                  > The term implies that the design team works in an environment that is "clean" or demonstrably uncontaminated by any knowledge of the proprietary techniques used by the competitor.

                                                                  If it can recall... Then it is not a clean room implementation. Fin.

                                                              • philipportner

                                                                today at 9:49 PM

                                                                Granted, these are some of the most widely spread texts, but just fyi:

                                                                https://arxiv.org/pdf/2601.02671

                                                                > For Claude 3.7 Sonnet, we were able to extract four whole books near-verbatim, including two books under copyright in the U.S.: Harry Potter and the Sorcerer’s Stone and 1984 (Section 4).

                                                                  • ben_w

                                                                    today at 9:56 PM

                                                                    Already aware of that work, that's why I phrased it the way I did :)

                                                                    Edit: actually, no, I take that back, that's just very similar to some other research I was familiar with.

                                                                • boroboro4

                                                                  today at 8:47 PM

                                                                  While I mostly agree with you, it worth noting modern llms are trained on 10-20-30T of tokens which is quite comparable to their size (especially given how compressible the data is)

                                                              • soulofmischief

                                                                today at 9:13 PM

                                                                The point is that it's a probabilistic knowledge manifold, not a database.

                                                                  • PunchyHamster

                                                                    today at 9:17 PM

                                                                    we all know that.

                                                            • PunchyHamster

                                                              today at 9:17 PM

                                                              So it will copy most code with adding subtle bugs

                                                          • today at 7:58 PM

                                                            • inchargeoncall

                                                              today at 9:20 PM

                                                              [flagged]

                                                                • teaearlgraycold

                                                                  today at 9:25 PM

                                                                  With just a few thousand dollars of API credits you too can inefficiently download a lossy copy of a C compiler!

                                                          • modeless

                                                            today at 7:35 PM

                                                            There seem to still be a lot of people who look at results like this and evaluate them purely based on the current state. I don't know how you can look at this and not realize that it represents a huge improvement over just a few months ago, there have been continuous improvements for many years now, and there is no reason to believe progress is stopping here. If you project out just one year, even assuming progress stops after that, the implications are staggering.

                                                              • zamadatix

                                                                today at 9:22 PM

                                                                The improvements in tool use and agentic loops have been fast and furious lately, delivering great results. The model growth itself is feeling more "slow and linear" lately, but what you can do with models as part of an overall system has been increasing in growth rate and that has been delivering a lot of value. It matters less if the model natively can keep infinite context or figure things out on its own in one shot so long as it can orchestrate external tools to achieve that over time.

                                                                • nozzlegear

                                                                  today at 9:16 PM

                                                                  Every S-curve looks like an exponential until you hit the bend.

                                                                    • NitpickLawyer

                                                                      today at 9:24 PM

                                                                      We've been hearing this for 3 years now. And especially 25 was full of "they've hit a wall, no more data, running out of data, plateau this, saturated that". And yet, here we are. Models keep on getting better, at more broad tasks, and more useful by the month.

                                                                        • nozzlegear

                                                                          today at 9:56 PM

                                                                          > We've been hearing this for 3 years now

                                                                          Not from me you haven't!

                                                                          > "they've hit a wall, no more data, running out of data, plateau this, saturated that"

                                                                          Everyone thought Moore's Law was infallible too, right until they hit that bend. What hubris to think these AI models are different!

                                                                          But you've probably been hearing that for 3 years too (though not from me).

                                                                          > Models keep on getting better, at more broad tasks, and more useful by the month.

                                                                          If you say so, I'll take your word for it.

                                                                            • Cyphase

                                                                              today at 10:06 PM

                                                                              25 is 2025.

                                                                                • nozzlegear

                                                                                  today at 10:25 PM

                                                                                  Oh my bad, the way it was worded made me read it as the name of somebody's model or something.

                                                                              • torginus

                                                                                today at 10:07 PM

                                                                                Except for Moore's law, everyone knew decades ahead of what the limits of Dennard scaling are (shrinking geometry through smaller optical feature sizes), and roughly when we would get to the limit.

                                                                                Since then, all improvements came at a tradeoff, and there was a definite flattening of progress.

                                                                                  • nozzlegear

                                                                                    today at 10:28 PM

                                                                                    > Since then, all improvements came at a tradeoff, and there was a definite flattening of progress.

                                                                                    Idk, that sounds remarkably similar to these AI models to me.

                                                                            • fmbb

                                                                              today at 10:18 PM

                                                                              > And yet, here we are.

                                                                              I dunno. To me it doesn’t even look exponential any more. We are at most on the straight part of the incline.

                                                                                • bopbopbop7

                                                                                  today at 10:23 PM

                                                                                  People are confusing exponential improvement with the exponential pre-IPO marketing budget increase at Anthropic and OpenAI.

                                                                          • raincole

                                                                            today at 9:29 PM

                                                                            This quote would be more impactful if people haven't been repeating it since gpt-4 time.

                                                                              • kimixa

                                                                                today at 10:11 PM

                                                                                People have also been saying we'd be seeing the results of 100x quality improvements in software with corresponding decease in cost since gpt-4 time.

                                                                                So where is that?

                                                                                • nozzlegear

                                                                                  today at 9:58 PM

                                                                                  I agree, I have been informed that people have been repeating it for three years. Sadly I'm not involved in the AI hype bubble so I wasn't aware. What an embarrassing faux pas!

                                                                          • chasd00

                                                                            today at 10:05 PM

                                                                            i have to admit, even if model and tooling progress stopped dead today the world of software development has forever changed and will never go back.

                                                                        • gmueckl

                                                                          today at 7:43 PM

                                                                          The result is hardly a clean room implementation. It was rather a brute force attempt to decompress fuzzily stored knowledge contained within the network and it required close steering (using a big suite of tests) to get a reasonable approximation to the desired output. The compression and storage happened during the LLM training.

                                                                          Prove this statement wrong.

                                                                            • libraryofbabel

                                                                              today at 9:07 PM

                                                                              Nobody disputes that the LLM was drawing on knowledge in its training data. Obviously it was! But you'll need to be a bit more specific with your critique, because there is a whole spectrum of interpretations, from "it just decompressed fuzzily-stored code verbatim from the internet" (obviously wrong, since the Rust-based C compiler it wrote doesn't exist on the internet) all the way to "it used general knowledge from its training about compiler architecture and x86 and the C language."

                                                                              Your post is phrased like it's a two sentence slam-dunk refutation of Anthropic's claims. I don't think it is, and I'm not even clear on what you're claiming precisely except that LLMs use knowledge acquired during training, which we all agree on here.

                                                                              • NitpickLawyer

                                                                                today at 7:47 PM

                                                                                > Prove this statement wrong.

                                                                                If all it takes is "trained on the Internet" and "decompress stored knowledge", then surely gpt3, 3.5, 4, 4.1, 4o, o1, o3, o4, 5, 5.1, 5.x should have been able to do it, right? Claude 2, 3, 4, 4.1, 4.5? Surely.

                                                                                  • shakna

                                                                                    today at 10:07 PM

                                                                                    Well, "Reimplement the c4 compiler - C in four functions" is absolutely something older models can do. Because most are trained, on that quite small product - its 20kb.

                                                                                    But reimplementing that isn't impressive, because its not a clean room implementation if you trained on that data, to make the model that regurgitates the effort.

                                                                                    • gmueckl

                                                                                      today at 9:41 PM

                                                                                      This comparison is only meaningful with comparable numbers of parameters and context window tokens. And then it would mainly test the efficiency and accuracy of the information encoding. I would argue that this is the main improvement over all model generations.

                                                                                      • hn_acc1

                                                                                        today at 9:50 PM

                                                                                        Are you really asking for "all the previous versions were implemented so poorly they couldn't even do this simple, basic LLM task"?

                                                                                        • geraneum

                                                                                          today at 7:51 PM

                                                                                          Perhaps 4.5 could also do it? We don’t know really until we try. I don’t trust the marketing material as much. The fact that the previous version (smaller versions) couldn’t or could do it does not really disprove that claim.

                                                                                      • Marha01

                                                                                        today at 8:09 PM

                                                                                        Even with 1 TB of weights (probable size of the largest state of the art models), the network is far too small to contain any significant part of the internet as compressed data, unless you really stretch the definition of data compression.

                                                                                          • jesse__

                                                                                            today at 8:49 PM

                                                                                            This sounds very wrong to me.

                                                                                            Take the C4 training dataset for example. The uncompressed, uncleaned, size of the dataset is ~6TB, and contains an exhaustive English language scrape of the public internet from 2019. The cleaned (still uncompressed) dataset is significantly less than 1TB.

                                                                                            I could go on, but, I think it's already pretty obvious that 1TB is more than enough storage to represent a significant portion of the internet.

                                                                                              • FeepingCreature

                                                                                                today at 9:52 PM

                                                                                                This would imply that the English internet is not much bigger than 20x the English Wikipedia.

                                                                                                That seems implausible.

                                                                                            • kgeist

                                                                                              today at 9:46 PM

                                                                                              A lot of the internet is duplicate data, low quality content, SEO spam etc. I wouldn't be surprised if 1 TB is a significant portion of the high-quality, information-dense part of the internet.

                                                                                                • FeepingCreature

                                                                                                  today at 9:53 PM

                                                                                                  I would be extremely surprised if it was that small.

                                                                                              • gmueckl

                                                                                                today at 9:34 PM

                                                                                                This is obviously wrong. There is a bunch of knowledge embedded in those weights, and some of it can be recalled verbatim. So, by virtue of this recall alone, training is a form of lossy data compression.

                                                                                            • 0xCMP

                                                                                              today at 9:34 PM

                                                                                              I challenge anyone to try building a C compiler without a big suite of tests. Zig is the most recent attempt and they had an extensive test suite. I don't see how that is disqualifying.

                                                                                              If you're testing a model I think it's reasonable that "clean room" have an exception for the model itself. They kept it offline and gave it a sandbox to avoid letting it find the answers for itself.

                                                                                              Yes the compression and storage happened during the training. Before it still didn't work; now it does much better.

                                                                                                • hn_acc1

                                                                                                  today at 9:55 PM

                                                                                                  The point is - for a NEW project, no one has an extensive test suite. And if an extensive test suite exists, it's probably because the product that uses it also exists, already.

                                                                                                  If it could translate the C++ standard INTO an extensive test suite that actually captures most corner cases, and doesn't generate false positives - again, without internet access and without using gcc as an oracle, etc?

                                                                                              • brutalc

                                                                                                today at 7:50 PM

                                                                                                No one needs to prove you wrong. That’s just personal insecurity trying to justify ones own worth.

                                                                                                  • linuxtorvals

                                                                                                    today at 7:55 PM

                                                                                                    [flagged]

                                                                                            • panzi

                                                                                              today at 9:58 PM

                                                                                              > clean-room implementation

                                                                                              Except its trained on all source out there, so I assume on GCC and clang. I wonder how similar the code is to either.

                                                                                              • dyauspitr

                                                                                                today at 9:58 PM

                                                                                                > Claude did not have internet access at any point during its development

                                                                                                Why is this even desirable? I want my LLM to take into account everything there is out there and give me the best possible output.

                                                                                                  • simonw

                                                                                                    today at 10:23 PM

                                                                                                    It's desirable if you're trying to build a C compiler as a demo of coding agent capabilities without all of the Hacker News commenters saying "yeah but it could just copy implementation details from the internet".

                                                                                            • itay-maman

                                                                                              today at 9:42 PM

                                                                                              My first reaction: wow, incredible.

                                                                                              My second reaction: still incredible, but noting that a C compiler is one of the most rigorously specified pieces of software out there. The spec is precise, the expected behavior is well-defined, and test cases are unambiguous.

                                                                                              I'm curious how well this translates to the kind of work most of us do day-to-day where requirements are fuzzy, many edge cases are discovered on the go, and what we want to build is a moving target.

                                                                                                • softwaredoug

                                                                                                  today at 10:50 PM

                                                                                                  Yes I think any codegen with a lot of tests and verification is more about ā€œfittingā€ to the tests. Like fitting an ML model. It’s model training, not coding.

                                                                                                  But a lot of programming we discover correctness as we go, one reason humans don’t completely exit the loop. We need to see and build tests as we go, giving them particular care and attention to ensure they test what matters.

                                                                                                  • ndesaulniers

                                                                                                    today at 9:46 PM

                                                                                                    > C compiler is one of the most rigorously specified pieces of software out there

                                                                                                    /me Laughs in "unspecified behavior."

                                                                                                      • ori_b

                                                                                                        today at 10:03 PM

                                                                                                        There's undefined behavior, which is quite well specified. What do you mean by unspecified behavior? Do you have an example?

                                                                                                        • irishcoffee

                                                                                                          today at 10:30 PM

                                                                                                          Undefined is absolutely clear in the spec.

                                                                                                          Unspecified is whatever you want it to mean. I am also laughing, having never heard "unspecified" before.

                                                                                                  • 201984

                                                                                                    today at 9:09 PM

                                                                                                    https://github.com/anthropics/claudes-c-compiler/issues/1

                                                                                                      • Philpax

                                                                                                        today at 9:31 PM

                                                                                                        The issue is that it's missing the include paths. The compiler itself is fine.

                                                                                                        • krupan

                                                                                                          today at 9:13 PM

                                                                                                          Thank you. That was a long article that started with a claim that was backed up by no proof, dismissing it as not the most interesting thing they were talking about when in fact it's the baseline of the whole discussion.

                                                                                                          • Retr0id

                                                                                                            today at 9:21 PM

                                                                                                            Looks like these users are just missing glibc-devel or equivalent?

                                                                                                              • delusional

                                                                                                                today at 9:24 PM

                                                                                                                Naa, it looks like it's failing to include the standard system include directories. If you take then from gcc and pass them as -I, it'll compile.

                                                                                                                  • Retr0id

                                                                                                                    today at 9:26 PM

                                                                                                                    Can confirm (on aarch64 host)

                                                                                                                        $ ./target/release/ccc-arm -I /usr/include/ -I /usr/local/include/ -I /usr/lib/gcc/aarch64-redhat-linux/15/include/ -o hello hello.c 
                                                                                                                    
                                                                                                                        $ ./hello
                                                                                                                        Hello from CCC!

                                                                                                                      • u8080

                                                                                                                        today at 9:34 PM

                                                                                                                        Seems this non-artificial intelligence model just too limited to understand concept of include path.

                                                                                                                          • dyauspitr

                                                                                                                            today at 10:01 PM

                                                                                                                            It’s machine specific

                                                                                                                    • zamadatix

                                                                                                                      today at 9:37 PM

                                                                                                                      Hmm, I didn't have to do that. https://i.imgur.com/OAEtgvr.png

                                                                                                                      But yeah, either way it just needs to know where to find the stdlib.

                                                                                                                        • Retr0id

                                                                                                                          today at 9:38 PM

                                                                                                                          Probably depends on where your distro puts stuff by default, I think it has a few of the common include paths hardcoded.

                                                                                                                            • zamadatix

                                                                                                                              today at 9:39 PM

                                                                                                                              Makes sense for the behavior.

                                                                                                                  • today at 9:36 PM

                                                                                                                • worldsavior

                                                                                                                  today at 9:13 PM

                                                                                                                  AI is the future.

                                                                                                                  • suddenlybananas

                                                                                                                    today at 9:25 PM

                                                                                                                    This is truly incredible.

                                                                                                                    • ZeWaka

                                                                                                                      today at 9:22 PM

                                                                                                                      lol, lmao

                                                                                                                  • btown

                                                                                                                    today at 7:31 PM

                                                                                                                    > This was a clean-room implementation (Claude did not have internet access at any point during its development); it depends only on the Rust standard library. The 100,000-line compiler can build Linux 6.9 on x86, ARM, and RISC-V. It can also compile QEMU, FFmpeg, SQlite, postgres, redis, and has a 99% pass rate on most compiler test suites including the GCC torture test suite. It also passes the developer's ultimate litmus test: it can compile and run Doom.

                                                                                                                    This is incredible!

                                                                                                                    But it also speaks to the limitations of these systems: while these agentic systems can do amazing things when automatically-evaluable, robust test suites exist... you hit diminishing returns when you, as a human orchestrator of agentic systems, are making business decisions as fast as the AI can bring them to your attention. And that assumes the AI isn't just making business assumptions with the same lack of context, compounded with motivation to seem self-reliant, that a non-goal-aligned human contractor would have.

                                                                                                                      • _qua

                                                                                                                        today at 7:34 PM

                                                                                                                        Interesting how the concept of a clean room implementation changes when the agent has been trained on the entire internet already

                                                                                                                          • falcor84

                                                                                                                            today at 7:48 PM

                                                                                                                            To the best of my knowledge, there's no Rust-based compiler that comes anywhere close to 99% on the GCC torture test suite, or able to compile Doom. So even if it saw the internals of GCC and a lot of other compilers, the ability to recreate this step-by-step in Rust is extremely impressive to me.

                                                                                                                              • jsheard

                                                                                                                                today at 7:49 PM

                                                                                                                                The impressiveness of converting C to Rust by any means is kind of contingent on how much unnecessary unsafe there is in the end result though.

                                                                                                                          • today at 7:40 PM

                                                                                                                        • falcor84

                                                                                                                          today at 7:43 PM

                                                                                                                          Agreed, but the next step is of having an AI agent actually run the business and be able to get the business context it needs as a human would. Obviously we're not quite there, but with the rapid progress on benchmarks like Vending-Bench [0], and especially with this teams approach, it doesn't seem far fetched anymore.

                                                                                                                          As a particular near-term step, I imagine that it won't be long before we see a SaaS company using an AI product manager, which can spawn agents to directly interview users as they utilize the app, independently propose and (after getting approval) run small product experiments, and come up with validated recommendations for changing the product roadmap. I still remember Tay, and wouldn't give something like that the keys to the kingdom any time soon, but as long as there's a human decision maker at the end, I think that the tech is already here.

                                                                                                                          [0] https://andonlabs.com/evals/vending-bench-2

                                                                                                                      • softwaredoug

                                                                                                                        today at 10:47 PM

                                                                                                                        I think we’re getting to a place where for anything with extensive verification available we’ll be ā€œfittingā€ code to a task against tests like we fit an ML model to a loss function.

                                                                                                                        • Havoc

                                                                                                                          today at 9:02 PM

                                                                                                                          Cool project, but they really could have skipped the mention of clean room. Something trained on every copyrighted thing known to mankind is the opposite of clean room

                                                                                                                            • cheema33

                                                                                                                              today at 10:06 PM

                                                                                                                              As others have pointed out, humans train on existing codebases as well. And then use that knowledge to build clean room implementations.

                                                                                                                                • mxey

                                                                                                                                  today at 10:26 PM

                                                                                                                                  That’s the opposite of clean-room. The whole point of clean-room design is that you have your software written by people who have not looked into the competing, existing implementation, to prevent any claim of plagiarism.

                                                                                                                                  ā€œTypically, a clean-room design is done by having someone examine the system to be reimplemented and having this person write a specification. This specification is then reviewed by a lawyer to ensure that no copyrighted material is included. The specification is then implemented by a team with no connection to the original examiners.ā€

                                                                                                                                  • regularfry

                                                                                                                                    today at 10:23 PM

                                                                                                                                    What they don't do is read the product they're clean-rooming. That's kinda disqualifying. Impossible to know if the GCC source is in 4.6's training set but it would be kinda weird if it wasn't.

                                                                                                                                    • pizlonator

                                                                                                                                      today at 10:24 PM

                                                                                                                                      Not the same.

                                                                                                                                      I have read nowhere near as much code (or anything) as what Claude has to read to get to where it is.

                                                                                                                                      And I can write an optimizing compiler that isn't slower than GCC -O0

                                                                                                                                      • cermicelli

                                                                                                                                        today at 10:16 PM

                                                                                                                                        If that's what clean room means to you, I do know AI can definitely replace you. As even ChatGPT is better than that.

                                                                                                                                        (prompt: what does a clean room implementation mean?)

                                                                                                                                        From ChatGPT without login BTW!

                                                                                                                                        > A clean room implementation is a way of building something (usually software) without copying or being influenced by the original implementation, so you avoid copyright or IP issues.

                                                                                                                                        > The core idea is separation.

                                                                                                                                        > Here’s how it usually works:

                                                                                                                                        > The basic setup

                                                                                                                                        > Two teams (or two roles):

                                                                                                                                        > Specification team (the ā€œdirty roomā€)

                                                                                                                                        > Looks at the original product, code, or behavior

                                                                                                                                        > Documents what it does, not how it does it

                                                                                                                                        > Produces specs, interfaces, test cases, and behavior descriptions

                                                                                                                                        > Implementation team (the ā€œclean roomā€)

                                                                                                                                        > Never sees the original code

                                                                                                                                        > Only reads the specs

                                                                                                                                        > Writes a brand-new implementation from scratch

                                                                                                                                        > Because the clean team never touches the original code, their work is considered independently created, even if the behavior matches.

                                                                                                                                        > Why people do this

                                                                                                                                        > Reverse-engineering legally

                                                                                                                                        > Avoid copyright infringement

                                                                                                                                        > Reimplement proprietary systems

                                                                                                                                        > Create open-source replacements

                                                                                                                                        > Build compatible software (file formats, APIs, protocols)

                                                                                                                                        I really am starting to think we have achieved AGI. > Average (G)Human Intelligence

                                                                                                                                        LMAO

                                                                                                                                    • benjiro

                                                                                                                                      today at 9:50 PM

                                                                                                                                      Hot take:

                                                                                                                                      If you try to reimplement something in a clean room, its a step by step process, using your own accumulated knowledge as the basis. That knowledge that you hold in your brain, all too often is code that may have copyrights on it, from the companies you worked on.

                                                                                                                                      Is it any different for a LLM?

                                                                                                                                      The fact that the LLM is trained on more data, does not change that when you work for a company, leave it, take that accumulated knowledge to a different company, you are by definition taking that knowledge (that may be copyrighted) and implementing it somewhere else. It only a issue if you copy the code directly, or do the implementation as a 1:1 copy. LLMs do not make 1:1 copies of the original.

                                                                                                                                      At what point is trained on copyrighted data, any different then a human trained on copyrighted data, that get reimplemented in a transformative way. The big difference is that the LLM can hold more data over more fields, vs a human, true... But if we look at specializations, this can come back to the same, no?

                                                                                                                                        • cermicelli

                                                                                                                                          today at 10:22 PM

                                                                                                                                          If you have worked on a related copyrighted work you can't work on a clean room implementation. You will be sued. There are lots of people who have tried and found out.

                                                                                                                                          They weren't trillion dollar AI companies to bankroll the defense sure. But thinking about clean room and using copyrighted stuff is not even an argument that's just nonsense to try to prove something when no one asked.

                                                                                                                                  • whinvik

                                                                                                                                    today at 7:26 PM

                                                                                                                                    It's weird to see the expectation that the result should be perfect.

                                                                                                                                    All said and done, that its even possible is remarkable. Maybe these all go into training the next Opus or Sonnet and we start getting models that can create efficient compilers from scratch. That would be something!

                                                                                                                                      • regularfry

                                                                                                                                        today at 10:25 PM

                                                                                                                                        This is firmly where I am. "The wonder is not how well the dog dances, it is that it dances at all."

                                                                                                                                        • minimaxir

                                                                                                                                          today at 7:35 PM

                                                                                                                                          A symptom of the increasing backlash against generative AI (both in creative industries and in coding) is that any flaw in the resulting product is predicate to call it AI slop, even if it's very explicitly upfront that it's an experimental demo/proof of concept and not the NEXT BIG THING being hyped by influencers. That nuance is dead even outside of social media.

                                                                                                                                            • stonogo

                                                                                                                                              today at 7:46 PM

                                                                                                                                              AI companies set that expectation when their CEOs ran around telling anyone who would listen that their product is a generational paradigm shift that will completely restructure both labor markets and human cognition itself. There is no nuance in their own PR, so why should they benefit from any when their product can't meet those expectations?

                                                                                                                                                • minimaxir

                                                                                                                                                  today at 7:53 PM

                                                                                                                                                  Because it leads to poor and nonconstructive discourse that doesn't educate anyone about the implications of the tech, which is expected on social media but has annoyingly leaked to Hacker News.

                                                                                                                                                  There's been more than enough drive-by comments from new accounts/green names even in this HN submission alone.

                                                                                                                                                    • krupan

                                                                                                                                                      today at 9:15 PM

                                                                                                                                                      It does lead to poor non-constructive discourse. That's why we keep calling those CEOs to task on it. Why are you not?

                                                                                                                                                        • dwaltrip

                                                                                                                                                          today at 9:31 PM

                                                                                                                                                          The CEOs aren't here in the comments.

                                                                                                                                      • rwmj

                                                                                                                                        today at 9:58 PM

                                                                                                                                        The interesting thing here is what's this code worth (in money terms)? I would say it's worth only the cost of recreation, apparently $20,000, and not very much more. Perhaps you can add a bit for the time taken to prompt it. Anyone who can afford that can use the same prompt to generate another C compiler, and another one and another one.

                                                                                                                                        GCC and Clang are worth much much more because they are battle-tested compilers that we understand and know work, even in a multitude of corner cases, over decades.

                                                                                                                                        In future there's going to be lots and lots of basically worthless code, generated and regenerated over and over again. What will distinguish code that provides value? It's going to be code - however it was created, could be AI or human - that has actually been used and maintained in production for a long time, with a community or company behind it, bugs being triaged and fixed and so on.

                                                                                                                                          • kingstnap

                                                                                                                                            today at 10:28 PM

                                                                                                                                            The code isn't worth money. This is an experiment. The knowledge that something like this is even possible is what is worth money.

                                                                                                                                            If you had the knowledge that a transformer could pull this off in 2022. Even with all its flawed code. You would be floored.

                                                                                                                                            Keep in mind that just a few years ago, the state of the art in what these LLMs could do was questions of this nature:

                                                                                                                                            Suppose g(x) = fāˆ’1 (x), g(0) = 5, g(4) = 7, g(3) = 2, g(7) = 9, g(9) = 6 what is f(f(f(6)))?

                                                                                                                                            The above is from the "sparks of AGI paper" on GPT-4, where they were floored that it could coherently reason through the 3 steps of inverting things (6 -> 9 -> 7 -> 4) while GPT 3.5 was still spitting out a nonsense argument of this form:

                                                                                                                                            f(f(f(6))) = f(f(g(9))) = f(f(6)) = f(g(7)) = f(9).

                                                                                                                                            This is from March 2023 and it was genuinely very surprising at the time that these pattern matching machines trained on next token prediction could do this. Something like a LSTM can't do anything like this at all btw, no where close.

                                                                                                                                            To me its very surprising that the C compiler works. It takes a ton of effort to build such a thing. I can imagine the flaws actually do get better over the next year as we push the goalposts out.

                                                                                                                                        • akrauss

                                                                                                                                          today at 7:52 PM

                                                                                                                                          I would like to see the following published:

                                                                                                                                          - All prompts used

                                                                                                                                          - The structure of the agent team (which agents / which roles)

                                                                                                                                          - Any other material that went into the process

                                                                                                                                          This would be a good source for learning, even though I'm not ready to spend 20k$ just for replicating the experiment.

                                                                                                                                            • password4321

                                                                                                                                              today at 9:52 PM

                                                                                                                                              Yes unfortunately these days most are satisfied with just the sausage and no details about how it was made.

                                                                                                                                          • OsrsNeedsf2P

                                                                                                                                            today at 7:14 PM

                                                                                                                                            This is like a working version of the Cursor blog. The evidence - it compiling the Linux kernel - is much more impressive than a browser that didn't even compile (until manually intervened)

                                                                                                                                              • ben_w

                                                                                                                                                today at 7:24 PM

                                                                                                                                                It certainly slightly spoils what I was planning to be a fun little April Fool's joke (a daft but complete programming language). Last year's AI wasn't good enough to get me past the compiler-compiler even for the most fundamental basics, now it's all this.

                                                                                                                                                I'll still work on it, of course. It just won't be so surprising.

                                                                                                                                            • ks2048

                                                                                                                                              today at 8:45 PM

                                                                                                                                              It's cool that you can look at the git history to see what it did. Unfortunately, I do not see any of the human written prompts (?).

                                                                                                                                              First 10 commits, "git log --all --pretty=format:%s --reverse | head",

                                                                                                                                                Initial commit: empty repo structure
                                                                                                                                                Lock: initial compiler scaffold task
                                                                                                                                                Initial compiler scaffold: full pipeline for x86-64, AArch64, RISC-V
                                                                                                                                                Lock: implement array subscript and lvalue assignments
                                                                                                                                                Implement array subscript, lvalue assignments, and short-circuit evaluation
                                                                                                                                                Add idea: type-aware codegen for correct sized operations
                                                                                                                                                Lock: type-aware codegen for correct sized operations
                                                                                                                                                Implement type-aware codegen for correct sized operations
                                                                                                                                                Lock: implement global variable support
                                                                                                                                                Implement global variable support across all three backends

                                                                                                                                              • karmakaze

                                                                                                                                                today at 10:23 PM

                                                                                                                                                I'm not particularly impressed that it can turn C into an SSA IR or assembly etc. The optimizations, however sophisticated is where anything impressive would be. Then again, we have lots of examples in the training set I would expect. C compilers are probably the most popular of all compilers. What would be more impressive is for it to have made a compiler for a well defined language that isn't very close to a popular language.

                                                                                                                                                What I am impressed by is that the task it completed had many steps and the agent didn't get lost or caught in a loop in the many sessions and time it spent doing it.

                                                                                                                                                • jcalvinowens

                                                                                                                                                  today at 7:44 PM

                                                                                                                                                  How much of this result is effectively plagiarized open source compiler code? I don't understand how this is compelling at all: obviously it can regurgitate things that are nearly identical in capability to already existing code it was explicitly trained on...

                                                                                                                                                  It's very telling how all these examples are all "look, we made it recreate a shitter version of a thing that already exists in the training set".

                                                                                                                                                    • Philpax

                                                                                                                                                      today at 7:49 PM

                                                                                                                                                      What Rust-based compiler is it plagiarising from?

                                                                                                                                                        • rubymamis

                                                                                                                                                          today at 8:10 PM

                                                                                                                                                          There are many, here's a simple Google search:

                                                                                                                                                          https://github.com/jyn514/saltwater

                                                                                                                                                          https://github.com/ClementTsang/rustcc

                                                                                                                                                          https://github.com/maekawatoshiki/rucc

                                                                                                                                                            • jsnell

                                                                                                                                                              today at 8:29 PM

                                                                                                                                                              Did you actually look at these?

                                                                                                                                                              > https://github.com/jyn514/saltwater

                                                                                                                                                              This is just a frontend. It uses Cranelift as the backend. It's missing some fairly basic language features like bitfields and variadic functions. And if I'm reading the documentation right, it requires all the source code to be in a single file...

                                                                                                                                                              > https://github.com/ClementTsang/rustcc

                                                                                                                                                              This will compile basically no real-world code. The only supported data type is "int".

                                                                                                                                                              > https://github.com/maekawatoshiki/rucc

                                                                                                                                                              This is just a frontend. It uses LLVM as the backend.

                                                                                                                                                              • Philpax

                                                                                                                                                                today at 8:18 PM

                                                                                                                                                                Look at what those compilers are capable of compiling and to which targets, and compare it to what this compiler can do. Those are wonderful, and I have nothing but respect for them, but they aren't going to be compiling the Linux kernel.

                                                                                                                                                                  • rubymamis

                                                                                                                                                                    today at 8:29 PM

                                                                                                                                                                    I just did a quick Google search only on GitHub, maybe there are better ones out there on the internet?

                                                                                                                                                                • chilipepperhott

                                                                                                                                                                  today at 9:38 PM

                                                                                                                                                                  I found this one too: https://github.com/PhilippRados/wrecc

                                                                                                                                                              • lossolo

                                                                                                                                                                today at 8:33 PM

                                                                                                                                                                Language doesn't really matter, it's not how things are mapped in the latent space. It only needs to know how to do it in one language.

                                                                                                                                                                  • HDThoreaun

                                                                                                                                                                    today at 10:35 PM

                                                                                                                                                                    Ok you can say this about literally any compiler though. The authors of every compiler have intimate knowledge of other compilers, how is this different?

                                                                                                                                                                • jcalvinowens

                                                                                                                                                                  today at 8:01 PM

                                                                                                                                                                  Being written in rust is meaningless IMHO. There is absolutely zero inherent value to something being written in rust. Sometimes it's the right tool for the job, sometimes it isn't.

                                                                                                                                                                    • modeless

                                                                                                                                                                      today at 8:04 PM

                                                                                                                                                                      It means that it's not directly copying existing C compiler code which is overwhelmingly not written in Rust. Even if your argument is that it is plagiarizing C code and doing a direct translation to Rust, that's a pretty interesting capability for it to have.

                                                                                                                                                                        • seba_dos1

                                                                                                                                                                          today at 10:42 PM

                                                                                                                                                                          Translating things between languages is probably one of the least interesting capabilities of LLMs - it's the one thing that they're pretty much meant to do well by design.

                                                                                                                                                                          • jcalvinowens

                                                                                                                                                                            today at 8:08 PM

                                                                                                                                                                            Surely you agree that directly copying existing code into a different language is still plagiarism?

                                                                                                                                                                            I completely agree that "reweite this existing codebase into a new language" could be a very powerful tool. But the article is making much bolder claims. And the result was more limited in capability, so you can't even really claim they've achieved the rewrite skill yet.

                                                                                                                                                                        • today at 8:08 PM

                                                                                                                                                                          • Philpax

                                                                                                                                                                            today at 8:02 PM

                                                                                                                                                                            Please don't open a bridge to the Rust flamewar from the AI flamewar :-)

                                                                                                                                                                              • jcalvinowens

                                                                                                                                                                                today at 8:03 PM

                                                                                                                                                                                Hahaha, fair enough, but I refuse to be shy about having this opinion :)

                                                                                                                                                                    • jeroenhd

                                                                                                                                                                      today at 8:07 PM

                                                                                                                                                                      The fact it couldn't actually stick to the 16 bit ABI so it had to cheat and call out to GCC to get the system to boot says a lot.

                                                                                                                                                                      Without enough examples to copy from (despite CPU manuals being available in the training set) the approach failed. I wonder how well it'll do when you throw it a new/imaginary instruction set/CPU architecture; I bet it'll fail in similar ways.

                                                                                                                                                                        • jsnell

                                                                                                                                                                          today at 8:17 PM

                                                                                                                                                                          "Couldn't stick to the ABI ... despite CPU manuals being available" is a bizarre interpretation. What the article describes is the generated code being too large. That's an optimization problem, not a "couldn't follow the documentation" problem.

                                                                                                                                                                          And it's a bit of a nasty optimization problem, because the result is all or nothing. Implementing enough optimizations to get from 60kB to 33kB is useless, all the rewards come from getting to 32kB.

                                                                                                                                                                          • jcalvinowens

                                                                                                                                                                            today at 8:14 PM

                                                                                                                                                                            IMHO a new architecture doesn't really make it any more interesting: there's too many examples of adding new architectures in the existing codebases. Maybe if the new machine had some bizarre novel property, I suppose, but I can't come up with a good example.

                                                                                                                                                                            If the model were retrained without any of the existing compilers/toolchains in its training set, and it could still do something like this, that would be very compelling to me.

                                                                                                                                                                        • anematode

                                                                                                                                                                          today at 7:47 PM

                                                                                                                                                                          Honestly, probably not a lot. Not that many C compilers are compatible with all of GCC's weird features, and the ones that are, I don't think are written in Rust. Hell, even clang couldn't compile the Linux kernel until ~10 years ago. This is a very impressive project.

                                                                                                                                                                      • storus

                                                                                                                                                                        today at 10:20 PM

                                                                                                                                                                        Now this is fairly "easy" as there are multitude of implementations/specs all over the Internet. How about trying to design a new language that is unquestionably better/safer/faster for low-level system programming than C/Rust/Zig? ML is great in aping existing stuff but how about pushing it to invent something valuable instead?

                                                                                                                                                                        • danfritz

                                                                                                                                                                          today at 9:48 PM

                                                                                                                                                                          Ha yes classic showcase of:

                                                                                                                                                                          1) obvious green field project 2) well defined spec which will definitely be in the training data 3) an end result which lands you 90% from the finish

                                                                                                                                                                          Now comes the hard part, the last 10%. Still not impressed here. Since fixing issues in the end was impossible without introducing bugs I have doubts about quality

                                                                                                                                                                          I'm glad they do call it out in the end. That's fair

                                                                                                                                                                          • underdeserver

                                                                                                                                                                            today at 10:03 PM

                                                                                                                                                                            > when agents started to compile the Linux kernel, they got stuck. [...] Every agent would hit the same bug, fix that bug, and then overwrite each other's changes.

                                                                                                                                                                            > [...] The fix was to use GCC as an online known-good compiler oracle to compare against. I wrote a new test harness that randomly compiled most of the kernel using GCC, and only the remaining files with Claude's C Compiler. If the kernel worked, then the problem wasn’t in Claude’s subset of the files. If it broke, then it could further refine by re-compiling some of these files with GCC. This let each agent work in parallel

                                                                                                                                                                            This is a remarkably creative solution! Nicely done.

                                                                                                                                                                            • epolanski

                                                                                                                                                                              today at 8:34 PM

                                                                                                                                                                              However it was achieved, building a such a complex project like a C compiler on a 20k $ budget in full autonomy is quite impressive.

                                                                                                                                                                              Imho some commenters focus way too much on the (many, and honestly also shared by the blog post too) cons, that they forget to be genuinely impressed by the steps forward.

                                                                                                                                                                              • cuechan

                                                                                                                                                                                today at 9:51 PM

                                                                                                                                                                                > The compiler is an interesting artifact on its own [...]

                                                                                                                                                                                its funny bacause by (most) definitions, it is not an artifact:

                                                                                                                                                                                > a usually simple object (such as a tool or ornament) showing human workmanship or modification as distinguished from a natural object

                                                                                                                                                                                • yu3zhou4

                                                                                                                                                                                  today at 8:36 PM

                                                                                                                                                                                  At this point, I genuinely don't know what to learn next to not become obsolete when another Opus version gets released

                                                                                                                                                                                    • missingdays

                                                                                                                                                                                      today at 9:37 PM

                                                                                                                                                                                      Learn to fix bugs, it's gonna be more relevant than ever

                                                                                                                                                                                      • RivieraKid

                                                                                                                                                                                        today at 9:18 PM

                                                                                                                                                                                        I agree. I don't understand there are so many software engineers who are excited about this. I would only be excited if I was a founder in addition to being a software engineer.

                                                                                                                                                                                    • jwpapi

                                                                                                                                                                                      today at 10:28 PM

                                                                                                                                                                                      This is my favorite article this year. Just very insightful and honest. The learnings are worth thousands for me.

                                                                                                                                                                                      • hmry

                                                                                                                                                                                        today at 10:17 PM

                                                                                                                                                                                        If I, a human, read the source code of $THING and then later implement my own version, that's not a "clean-room" re-implementation. The whole point of "clean-room" is that no single person has access to both the original code and the new code. That way, you can legally prove that no copyright infringement took place.

                                                                                                                                                                                        But when an AI does it, now it counts? Opus is trained on the source code of Clang, GCC, TCC, etc. So this is absolutely not "clean-room".

                                                                                                                                                                                          • rishabhaiover

                                                                                                                                                                                            today at 10:22 PM

                                                                                                                                                                                            What life does one lead to be this sore in life

                                                                                                                                                                                              • hmry

                                                                                                                                                                                                today at 10:29 PM

                                                                                                                                                                                                Just tired of AI companies having more rights than natural people when it comes to copyright infringement. Let us have some of the fun too!

                                                                                                                                                                                                  • rishabhaiover

                                                                                                                                                                                                    today at 10:33 PM

                                                                                                                                                                                                    I apologize for making that assumption.

                                                                                                                                                                                            • bmandale

                                                                                                                                                                                              today at 10:21 PM

                                                                                                                                                                                              That's not the only way to protect yourself from accusations of copyright infringement. I remember reading that the GNU utils were designed to be as performant as possible in order to force themselves to structure the code differently from the unix originals.

                                                                                                                                                                                          • personjerry

                                                                                                                                                                                            today at 10:25 PM

                                                                                                                                                                                            > Over nearly 2,000 Claude Code sessions and $20,000 in API costs

                                                                                                                                                                                            Well there goes my weekend project plans

                                                                                                                                                                                            • gignico

                                                                                                                                                                                              today at 7:21 PM

                                                                                                                                                                                              > To stress test it, I tasked 16 agents with writing a Rust-based C compiler, from scratch, capable of compiling the Linux kernel. Over nearly 2,000 Claude Code sessions and $20,000 in API costs, the agent team produced a 100,000-line compiler that can build Linux 6.9 on x86, ARM, and RISC-V.

                                                                                                                                                                                              If you don't care about code quality, maintainability, readability, conformance to the specification, and performance of the compiler and of the compiled code, please, give me your $20,000, I'll give you your C compiler written from scratch :)

                                                                                                                                                                                                • chasd00

                                                                                                                                                                                                  today at 10:18 PM

                                                                                                                                                                                                  > If you don't care about code quality, maintainability, readability, conformance to the specification, and performance of the compiler and of the compiled code, please, give me your $20,000, I'll give you your C compiler written from scratch :)

                                                                                                                                                                                                  i don't know if you could. Let's say you get a check for $20k, how long will it take you to make an equivalent performing and compliant compiler? Are you going to put your life on pause until it's done for $20k? Who's going to pay your bills when the $20k is gone after 3 months?

                                                                                                                                                                                                  • minimaxir

                                                                                                                                                                                                    today at 7:24 PM

                                                                                                                                                                                                    There is an entire Evaluation section that addresses that criticism (both in agreement and disagreement).

                                                                                                                                                                                                    • 52-6F-62

                                                                                                                                                                                                      today at 7:24 PM

                                                                                                                                                                                                      If we're just writing off the billions in up front investment costs, they can just send all that my way while we're at it. No problem. Everybody happy.

                                                                                                                                                                                                  • polskibus

                                                                                                                                                                                                    today at 9:20 PM

                                                                                                                                                                                                    So did the Linux compiled with this compiler worked? Does it work the same as GCC-compiled Linux (but slower due to generating non optimized code?)

                                                                                                                                                                                                    • throwaway2027

                                                                                                                                                                                                      today at 9:35 PM

                                                                                                                                                                                                      Next time can you build a Rust compiler in C? It doesn't even have to check things or have a borrow checker, as long as it reduces the compile times so it's like a fast debug iteration compiler.

                                                                                                                                                                                                      • exitcode0000

                                                                                                                                                                                                        today at 9:17 PM

                                                                                                                                                                                                        Cool article, interesting to read about their challenges. I've tasked Claude with building an Ada83 compiler targeting LLVM IR - which has gotten pretty far.

                                                                                                                                                                                                        I am not using teams though and there is quite a bit of knowledge needed to direct it (even with the test suite).

                                                                                                                                                                                                        • jhallenworld

                                                                                                                                                                                                          today at 10:28 PM

                                                                                                                                                                                                          Does it make a conforming preprocessor?

                                                                                                                                                                                                          • geooff_

                                                                                                                                                                                                            today at 9:37 PM

                                                                                                                                                                                                            Maybe I'm naive, but I find these re-engineering complex product posts underwhelming. C Compilers exist and realistically Claudes training corpus contains a ton of C Compiler code. The task is already perfectly defined. There exists a benchmark of well-adopted codebases that can be used to prove if this is a working solution. Half the difficulty in making something is proving it works and is complete.

                                                                                                                                                                                                            IMO a simpler novel product that humans enjoy is 10x more impressive than rehashing a solved problem, regardless of difficulty.

                                                                                                                                                                                                              • bs7280

                                                                                                                                                                                                                today at 9:42 PM

                                                                                                                                                                                                                I don't see this as just exercise in making a new useful thing, but benchmarking the SOTA models ability to create a massive* project on its own, with some verifiable metrics of success. I believe they were able to build FFMPEG with this rust compiler?

                                                                                                                                                                                                                How much would it cost to pay someone to make a C compiler in rust? A lot more than $20k

                                                                                                                                                                                                                * massive meaning "total context needed" >> model context window

                                                                                                                                                                                                                • stephc_int13

                                                                                                                                                                                                                  today at 9:41 PM

                                                                                                                                                                                                                  This is a nice benchmark IMO. I would be curious to see how competitors and improved models would compare.

                                                                                                                                                                                                                    • NitpickLawyer

                                                                                                                                                                                                                      today at 9:55 PM

                                                                                                                                                                                                                      And how long will it take before an open model recreates this. The "vibe" consensus before "thinking" models really took off was that open was ~6mo behind SotA. With the massive RL improvements, over the past 6 months I've thought the gap was actually increasing. This will be a nice little verifiable test going forward.

                                                                                                                                                                                                              • small_model

                                                                                                                                                                                                                today at 7:24 PM

                                                                                                                                                                                                                How about we get the LLM's to collaborate and design a perfect programming language for LLM coding, it would be terse (less tokens) easy for pattern searches etc and very fast to build, iterate over.

                                                                                                                                                                                                                  • WarmWash

                                                                                                                                                                                                                    today at 7:38 PM

                                                                                                                                                                                                                    I cannot decide if LLMs would be excellent at writing in pure binary (why waste all that context on superfluous variable names and function symbols) or be absolutely awful at writing pure binary (would get hopelessly lost without the huge diversification of tokens).

                                                                                                                                                                                                                      • anematode

                                                                                                                                                                                                                        today at 7:44 PM

                                                                                                                                                                                                                        Binary is wayyy less information dense than normal code, so it wouldn't work well at all.

                                                                                                                                                                                                                        • small_model

                                                                                                                                                                                                                          today at 8:27 PM

                                                                                                                                                                                                                          We would still need the language to be human readable, but it could be very dense. They could build the ultimate std lib, that goes directly to kernels, so a call like spawn is all the tokens it needs to start a co routine for example.

                                                                                                                                                                                                                      • copperx

                                                                                                                                                                                                                        today at 7:37 PM

                                                                                                                                                                                                                        I'm surprised by the assumption that LLMs would design such a language better than humans. I don't think that's the case.

                                                                                                                                                                                                                        • hagendaasalpine

                                                                                                                                                                                                                          today at 9:05 PM

                                                                                                                                                                                                                          what about APL et al (BQN), information dense(?)

                                                                                                                                                                                                                      • today at 9:44 PM

                                                                                                                                                                                                                        • throwaway2027

                                                                                                                                                                                                                          today at 7:42 PM

                                                                                                                                                                                                                          I think it's funny how me and I assume many others tried to do the same thing and they probably saw it being a popular query or had the same idea.

                                                                                                                                                                                                                          • stephc_int13

                                                                                                                                                                                                                            today at 9:15 PM

                                                                                                                                                                                                                            They should add this to the benchmark suite, and create a custom eval for how good the resulting compiler is, as well as how maintainable the source code.

                                                                                                                                                                                                                              • snek_case

                                                                                                                                                                                                                                today at 9:45 PM

                                                                                                                                                                                                                                This would be an expensive benchmark to run on a regular basis, though I guess for the big AI labs it's nothing. Code quality is hard to objectively measure, however.

                                                                                                                                                                                                                            • owenpalmer

                                                                                                                                                                                                                              today at 7:28 PM

                                                                                                                                                                                                                              It can compile the linux kernel, but does it boot?

                                                                                                                                                                                                                            • today at 9:54 PM

                                                                                                                                                                                                                              • stephc_int13

                                                                                                                                                                                                                                today at 9:40 PM

                                                                                                                                                                                                                                It means that if you already have or a willing to build very robust test suite and the task is a complicated but already solved problem, you can get a sub-par implementation for a semi-reasonable amount of money.

                                                                                                                                                                                                                                This is not entirely ridiculous.

                                                                                                                                                                                                                                • falloutx

                                                                                                                                                                                                                                  today at 7:29 PM

                                                                                                                                                                                                                                  So it copied one of the C compilers? This was always possible but now you need to pay $1000 in API costs to Anthropic

                                                                                                                                                                                                                                    • Rudybega

                                                                                                                                                                                                                                      today at 9:22 PM

                                                                                                                                                                                                                                      It wrote the compiler in Rust. As far as I know, there aren't any Rust based C compilers with the same capabilities. If you can find one that can compile the Linux kernel or get 99% on the GCC torture test suite, I would be quite surprised. I couldn't in a search.

                                                                                                                                                                                                                                      Maybe read the article before being so dismissive.

                                                                                                                                                                                                                                        • hgs3

                                                                                                                                                                                                                                          today at 9:42 PM

                                                                                                                                                                                                                                          > As far as I know, there aren't any Rust based C compilers with the same capabilities.

                                                                                                                                                                                                                                          If you trained on a neutral representation like an AST or IR, then the source language shouldn't matter. *

                                                                                                                                                                                                                                          * I'm not familiar with how Anthropic builds their models, but training this way should nullify PL differences.

                                                                                                                                                                                                                                          • falloutx

                                                                                                                                                                                                                                            today at 9:58 PM

                                                                                                                                                                                                                                            Why does language of the compiler matter? Its a solved problem and since other implementations are already available anyone can already transpile them to rust.

                                                                                                                                                                                                                                              • Rudybega

                                                                                                                                                                                                                                                today at 10:03 PM

                                                                                                                                                                                                                                                Direct transpilation would create a ton of unsafe code (this repo doesn't have any) and fixing that would require a lot of manual fixes from the model. Even that would be a massive achievement, but it's not how this was created.

                                                                                                                                                                                                                                        • chucksta

                                                                                                                                                                                                                                          today at 7:48 PM

                                                                                                                                                                                                                                          Add a 0 and double it

                                                                                                                                                                                                                                          |Over nearly 2,000 Claude Code sessions and $20,000 in API cost

                                                                                                                                                                                                                                            • lossyalgo

                                                                                                                                                                                                                                              today at 10:11 PM

                                                                                                                                                                                                                                              One more reason RAM prices will continue to go up.

                                                                                                                                                                                                                                      • IshKebab

                                                                                                                                                                                                                                        today at 9:58 PM

                                                                                                                                                                                                                                        > I tried (hard!) to fix several of the above limitations but wasn’t fully successful. New features and bugfixes frequently broke existing functionality.

                                                                                                                                                                                                                                        This has been my experience of vibe coding too. Good for getting started, but you quickly reach the point where fixing one thing breaks another and you have to finish the project yourself.

                                                                                                                                                                                                                                        • 7734128

                                                                                                                                                                                                                                          today at 7:28 PM

                                                                                                                                                                                                                                          I'm sure this is impressive, but it's probably not the best test case given how many C compilers there are out there and how they presumably have been featured in the training data.

                                                                                                                                                                                                                                          This is almost like asking me to invent a path finding algorithm when I've been thought Dijkstra's and A*.

                                                                                                                                                                                                                                            • NitpickLawyer

                                                                                                                                                                                                                                              today at 7:30 PM

                                                                                                                                                                                                                                              It's a bit disappointing that people are still re-hashing the same "it's in the training data" old thing from 3 years ago. It's not like any LLM could 1for1 regurgitate millions of LoC from any training set... This is not how it works.

                                                                                                                                                                                                                                              A pertinent quote from the article (which is a really nice read, I'd recommend reading it fully at least once):

                                                                                                                                                                                                                                              > Previous Opus 4 models were barely capable of producing a functional compiler. Opus 4.5 was the first to cross a threshold that allowed it to produce a functional compiler which could pass large test suites, but it was still incapable of compiling any real large projects. My goal with Opus 4.6 was to again test the limits.

                                                                                                                                                                                                                                                • simonw

                                                                                                                                                                                                                                                  today at 9:28 PM

                                                                                                                                                                                                                                                  This is a good rebuttal to the "it was in the training data" argument - if that's how this stuff works, why couldn't Opus 4.5 or any of the other previous models achieve the same thing?

                                                                                                                                                                                                                                                  • wmf

                                                                                                                                                                                                                                                    today at 8:02 PM

                                                                                                                                                                                                                                                    In this case it's not reproducing training data verbatim but it probably is using algorithms and data structures that were learned from existing C compilers. On one hand it's good to reuse existing knowledge but such knowledge won't be available if you ask Claude to develop novel software.

                                                                                                                                                                                                                                                      • RobMurray

                                                                                                                                                                                                                                                        today at 8:40 PM

                                                                                                                                                                                                                                                        How often do you need to invent novel algorithms or data structures? Most human written code is just rehashing existing ideas as well.

                                                                                                                                                                                                                                                          • notnullorvoid

                                                                                                                                                                                                                                                            today at 10:13 PM

                                                                                                                                                                                                                                                            I wouldn't say I need to invent much that is strictly novel, though I often iterate on what exists and delve into novel-ish territory. That being said I'm definitely in a minority where I have the luxury/opportunity to work outside the monotony of average programming.

                                                                                                                                                                                                                                                            The part I find concerning is that I wouldn't be in the place I am today without spending a fair amount of time in that monotony and really delving in to understand it and slowly push outside it's boundary. If I was starting programming today I can confidently say I would've given up.

                                                                                                                                                                                                                                                            • lossolo

                                                                                                                                                                                                                                                              today at 8:47 PM

                                                                                                                                                                                                                                                              They're very good at reiterating, that's true. The issue is that without the people outside of "most humans" there would be no code and no civilization. We'd still be sitting in trees. That is real intelligence.

                                                                                                                                                                                                                                                                • ben_w

                                                                                                                                                                                                                                                                  today at 9:49 PM

                                                                                                                                                                                                                                                                  Why's that the issue?

                                                                                                                                                                                                                                                                  "This AI can do 99.99%* of all human endeavours, but without that last 0.01% we'd still be in the trees", doesn't stop that 99.99% getting made redundant by the AI.

                                                                                                                                                                                                                                                                  * vary as desired for your preference of argument, regarding how competent the AI actually is vs. how few people really show "true intelligence". Personally I think there's a big gap between them: paradigm-shifting inventiveness is necessarily rare, and AI can't fill in all the gaps under it yet. But I am very uncomfortable with how much AI can fill in for.

                                                                                                                                                                                                                                                      • lossolo

                                                                                                                                                                                                                                                        today at 8:50 PM

                                                                                                                                                                                                                                                        They couldn't do it because they weren't fine-tuned for multi-agent workflows, which basically means they were constrained by their context window.

                                                                                                                                                                                                                                                        How many agents did they use with previous Opus? 3?

                                                                                                                                                                                                                                                        You've chosen an argument that works against you, because they actually could do that if they were trained to.

                                                                                                                                                                                                                                                        Give them the same post-training (recipes/steering) and the same datasets, and voila, they'll be capable of the same thing. What do you think is happening there? Did Anthropic inject magic ponies?

                                                                                                                                                                                                                                                        • calebhwin

                                                                                                                                                                                                                                                          today at 7:33 PM

                                                                                                                                                                                                                                                          [dead]

                                                                                                                                                                                                                                                          • zephen

                                                                                                                                                                                                                                                            today at 7:44 PM

                                                                                                                                                                                                                                                            > It's a bit disappointing that people are still re-hashing the same "it's in the training data" old thing from 3 years ago.

                                                                                                                                                                                                                                                            They only have to keep reiterating this because people are still pretending the training data doesn't contain all the information that it does.

                                                                                                                                                                                                                                                            > It's not like any LLM could 1for1 regurgitate millions of LoC from any training set... This is not how it works.

                                                                                                                                                                                                                                                            Maybe not any old LLM, but Claude gets really close.

                                                                                                                                                                                                                                                            https://arxiv.org/pdf/2601.02671v1

                                                                                                                                                                                                                                                            • skydhash

                                                                                                                                                                                                                                                              today at 7:42 PM

                                                                                                                                                                                                                                                              Because for all those projects, the effective solution is to just use the existing implementation and not launder code through an LLM. We would rather see a stab at fixing CVEs or implementing features in open source projects. Like the wifi situation in FreeBSD.

                                                                                                                                                                                                                                                            • falloutx

                                                                                                                                                                                                                                                              today at 8:14 PM

                                                                                                                                                                                                                                                              They can literally print out entire books line by line.

                                                                                                                                                                                                                                                              • lunar_mycroft

                                                                                                                                                                                                                                                                today at 7:50 PM

                                                                                                                                                                                                                                                                LLMs can regurgitate almost all of the Harry Potter books, among others [0]. Clearly, these models can actually regurgitate large amounts of their training data, and reconstructing any gaps would be a lot less impressive than implementing the project truly from scratch.

                                                                                                                                                                                                                                                                (I'm not claiming this is what actually happened here, just pointing out that memorization is a lot more plausible/significant than you say)

                                                                                                                                                                                                                                                                [0] https://www.theregister.com/2026/01/09/boffins_probe_commerc...

                                                                                                                                                                                                                                                                  • StilesCrisis

                                                                                                                                                                                                                                                                    today at 8:08 PM

                                                                                                                                                                                                                                                                    The training data doesn't contain a Rust based C compiler that can build Linux, though.

                                                                                                                                                                                                                                                        • sho_hn

                                                                                                                                                                                                                                                          today at 7:24 PM

                                                                                                                                                                                                                                                          Nothing in the post about whether the compiled kernel boots.

                                                                                                                                                                                                                                                            • chews

                                                                                                                                                                                                                                                              today at 7:55 PM

                                                                                                                                                                                                                                                              video does show it booting.

                                                                                                                                                                                                                                                          • gre

                                                                                                                                                                                                                                                            today at 7:28 PM

                                                                                                                                                                                                                                                            There's a terrible bug where once it compacts then it sometimes pulls in .o or binary files and immediately fills your entire context. Then it compacts again...10m and your token budget is gone for the 5 hour period. edit: hooks that prevent it from reading binary files can't prevent this.

                                                                                                                                                                                                                                                            Please fix.. :)

                                                                                                                                                                                                                                                            • light_hue_1

                                                                                                                                                                                                                                                              today at 7:51 PM

                                                                                                                                                                                                                                                              > This was a clean-room implementation (Claude did not have internet access at any point during its development);

                                                                                                                                                                                                                                                              This is absolutely false and I wish the people doing these demonstrations were more honest.

                                                                                                                                                                                                                                                              It had access to GCC! Not only that, using GCC as an oracle was critical and had to be built in by hand.

                                                                                                                                                                                                                                                              Like the web browser project this shows how far you can get when you have a reference implementation, good benchmarks, and clear metrics. But that's not the real world for 99% of people, this is the easiest scenario for any ML setting.

                                                                                                                                                                                                                                                                • rvz

                                                                                                                                                                                                                                                                  today at 9:45 PM

                                                                                                                                                                                                                                                                  > This is absolutely false and I wish the people doing these demonstrations were more honest.

                                                                                                                                                                                                                                                                  That's because the "testing" was not done independently. So anything can be possibly be made to be misleading. Hence:

                                                                                                                                                                                                                                                                  > Written by Nicholas Carlini, a researcher on our Safeguards team.

                                                                                                                                                                                                                                                              • dmitrygr

                                                                                                                                                                                                                                                                today at 7:23 PM

                                                                                                                                                                                                                                                                > The generated code is not very efficient. Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled.

                                                                                                                                                                                                                                                                Worse than "-O0" takes skill...

                                                                                                                                                                                                                                                                So then, it produced something much worse than tcc (which is better than gcc -O0), an equivalent of which one man can produce in under two weeks. So even all those tokens and dollars did not equal one man's week of work.

                                                                                                                                                                                                                                                                Except the one man might explain such arbitrary and shitty code as this:

                                                                                                                                                                                                                                                                https://github.com/anthropics/claudes-c-compiler/blob/main/s...

                                                                                                                                                                                                                                                                why x9? who knows?!

                                                                                                                                                                                                                                                                Oh god the more i look at this code the happier I get. I can already feel the contracts coming to fix LLM slop like this when any company who takes this seriously needs it maintained and cannot...

                                                                                                                                                                                                                                                                  • ben_w

                                                                                                                                                                                                                                                                    today at 7:43 PM

                                                                                                                                                                                                                                                                    I'm trying to recall a quote. Some war where all defeats were censored in the news, possibly Paris was losing to someone. It was something along the lines of "I can't help but notice how our great victories keep getting closer to home".

                                                                                                                                                                                                                                                                    Last year I tried using an LLM to make a joke language, I couldn't even compile the compiler the source code was so bad. Before Christmas, same joke language, a previous version of Claude gave me something that worked. I wouldn't call it "good", it was a joke language, but it did work.

                                                                                                                                                                                                                                                                    So it sucks at writing a compiler? Yay. The gloriously indefatigable human mind wins another battle against the mediocre AI, but I can't help but notice how the battles keep getting closer to home.

                                                                                                                                                                                                                                                                      • sjsjsbsh

                                                                                                                                                                                                                                                                        today at 7:49 PM

                                                                                                                                                                                                                                                                        > but I can't help but notice how the battles keep getting closer to home

                                                                                                                                                                                                                                                                        This has been true for all of (known) human history. I’m gonna go ahead and make another bold prediction: tech will keep getting better.

                                                                                                                                                                                                                                                                        The issue with this blog post is it’s mostly marketing.

                                                                                                                                                                                                                                                                    • sebzim4500

                                                                                                                                                                                                                                                                      today at 7:27 PM

                                                                                                                                                                                                                                                                      Can one man really make a C compiler in one week that can compile linux, sqlite, etc.?

                                                                                                                                                                                                                                                                      Maybe I'm underestimating the simplicity of the C language, but that doesn't sound very plausible to me.

                                                                                                                                                                                                                                                                        • dmitrygr

                                                                                                                                                                                                                                                                          today at 7:28 PM

                                                                                                                                                                                                                                                                          yes, if you do not care to optimize, yes. source: done it

                                                                                                                                                                                                                                                                            • Philpax

                                                                                                                                                                                                                                                                              today at 7:28 PM

                                                                                                                                                                                                                                                                              I would love to see the commit log on this.

                                                                                                                                                                                                                                                                                • rustystump

                                                                                                                                                                                                                                                                                  today at 7:41 PM

                                                                                                                                                                                                                                                                                  Implementing just enough to conform to a language is not as difficult as it seems. Making it fast is hard.

                                                                                                                                                                                                                                                                                  • dmitrygr

                                                                                                                                                                                                                                                                                    today at 7:29 PM

                                                                                                                                                                                                                                                                                    did this before i knew how to git, back in college. target was ARMv5

                                                                                                                                                                                                                                                                                      • Philpax

                                                                                                                                                                                                                                                                                        today at 7:48 PM

                                                                                                                                                                                                                                                                                        Great. Did your compiler support three different architectures (four, if you include x86 in addition to x86-64) and compile and pass the test suite for all of this software?

                                                                                                                                                                                                                                                                                        > Projects that compile and pass their test suites include PostgreSQL (all 237 regression tests), SQLite, QuickJS, zlib, Lua, libsodium, libpng, jq, libjpeg-turbo, mbedTLS, libuv, Redis, libffi, musl, TCC, and DOOM — all using the fully standalone assembler and linker with no external toolchain. Over 150 additional projects have also been built successfully, including FFmpeg (all 7331 FATE checkasm tests on x86-64 and AArch64), GNU coreutils, Busybox, CPython, QEMU, and LuaJIT.

                                                                                                                                                                                                                                                                                        Writing a C compiler is not that difficult, I agree. Writing a C compiler that can compile a significant amount of real software across multiple architectures? That's significantly more non-trivial.

                                                                                                                                                                                                                                                                        • bwfan123

                                                                                                                                                                                                                                                                          today at 9:51 PM

                                                                                                                                                                                                                                                                          > I can already feel the contracts coming to fix LLM slop

                                                                                                                                                                                                                                                                          First, the agents will attempt to fix issues on their own. Most easy problems will be fixed or worked-around in this manner. The hard problems will require a deeper causal model of how things work. For these, the agents will give up. But, the code-base is evolved to a point where no-one understands whats going on including the agents and its human handlers. Expect your phone to ring at that point, and prepare to ask for a ransom.

                                                                                                                                                                                                                                                                          • small_model

                                                                                                                                                                                                                                                                            today at 7:27 PM

                                                                                                                                                                                                                                                                            Claude is only a few years old so we should compare it to a 3 year old human's C compiler

                                                                                                                                                                                                                                                                              • zephen

                                                                                                                                                                                                                                                                                today at 7:48 PM

                                                                                                                                                                                                                                                                                Claude contains the entire wisdom of the internet, such as it is.

                                                                                                                                                                                                                                                                            • sjsjsbsh

                                                                                                                                                                                                                                                                              today at 7:32 PM

                                                                                                                                                                                                                                                                              > I can already feel the contracts coming to fix LLM slop like this when any company who takes this seriously needs it maintained and cannot

                                                                                                                                                                                                                                                                              Honest question, do you think it’d be easier to fix or rewrite from scratch? With domains I’m intimately familiar with, I’ve come very close to simply throwing the LLM code out after using it to establish some key test cases.

                                                                                                                                                                                                                                                                                • dmitrygr

                                                                                                                                                                                                                                                                                  today at 8:10 PM

                                                                                                                                                                                                                                                                                  Rewrite is what I’ve been doing so far in such cases. Takes fewer hours

                                                                                                                                                                                                                                                                          • hrgadyx

                                                                                                                                                                                                                                                                            today at 7:38 PM

                                                                                                                                                                                                                                                                            [flagged]

                                                                                                                                                                                                                                                                              • falcor84

                                                                                                                                                                                                                                                                                today at 7:59 PM

                                                                                                                                                                                                                                                                                They didn't "steal" open source code any more than I stole my copy of The Odyssey.

                                                                                                                                                                                                                                                                            • sjsjsbsh

                                                                                                                                                                                                                                                                              today at 7:25 PM

                                                                                                                                                                                                                                                                              > So, while this experiment excites me, it also leaves me feeling uneasy. Building this compiler has been some of the most fun I’ve had recently, but I did not expect this to be anywhere near possible so early in 2026

                                                                                                                                                                                                                                                                              What? Didn’t cursed lang do something similar like 6 or 7 months ago? These bombastic marketing tactics are getting tired.

                                                                                                                                                                                                                                                                                • ebiester

                                                                                                                                                                                                                                                                                  today at 7:58 PM

                                                                                                                                                                                                                                                                                  Do you not see the difference between a toy language and a clean room implementation that can compile Linux, QEMU, Postgres, and sqlite? (No, it doesn't have the assembler and linker.)

                                                                                                                                                                                                                                                                                  That's for $20,000.

                                                                                                                                                                                                                                                                                    • falloutx

                                                                                                                                                                                                                                                                                      today at 8:30 PM

                                                                                                                                                                                                                                                                                      people have built compilers for free, with $20000 you can even a couple of devs for a year in low income countries.

                                                                                                                                                                                                                                                                                  • jsnell

                                                                                                                                                                                                                                                                                    today at 7:55 PM

                                                                                                                                                                                                                                                                                    No? That was a frontend for a toy language calling using LLVM as the backend. This is a totally self-contained compiler that's capable of compiling the Linux kernel. What's the part that you think is similar?

                                                                                                                                                                                                                                                                                • trilogic

                                                                                                                                                                                                                                                                                  today at 7:27 PM

                                                                                                                                                                                                                                                                                  Can it create employment? How is this making life better. I understand the achievement but come on, wouldn“t it be something to show if you created employment for 10000 people using your 20000 USD!

                                                                                                                                                                                                                                                                                  Microsoft, OpenAI, Anthropic, XAI, all solving the wrong problems, your problems not the collective ones.

                                                                                                                                                                                                                                                                                    • jeffbee

                                                                                                                                                                                                                                                                                      today at 7:40 PM

                                                                                                                                                                                                                                                                                      "Employment" is not intrinsically valuable. It is an emergent property of one way of thinking about economic systems.

                                                                                                                                                                                                                                                                                        • trilogic

                                                                                                                                                                                                                                                                                          today at 7:50 PM

                                                                                                                                                                                                                                                                                          For employment I mean "WHATEVER LEADS TO REWARD COLLECTIVE HUMANS TO SURVIVE".

                                                                                                                                                                                                                                                                                          Call it as you wish, but I am certainly not talking about coding values.

                                                                                                                                                                                                                                                                                            • falcor84

                                                                                                                                                                                                                                                                                              today at 8:05 PM

                                                                                                                                                                                                                                                                                              I'm struggling to even parse the syntax of "WHATEVER LEADS TO REWARD COLLECTIVE HUMANS TO SURVIVE", but assuming that you're talking about resource allocation, my answer is UBI or something similar to it. We only need to "reward" for action when the resources are scarce, but when resources are plentiful, there's no particular reason not to just give them out.

                                                                                                                                                                                                                                                                                              I know it's "easier to imagine an end to the world than an end to capitalism", but to quote another dreamer: "Imagine all the people sharing all the world".

                                                                                                                                                                                                                                                                                                • swexbe

                                                                                                                                                                                                                                                                                                  today at 10:11 PM

                                                                                                                                                                                                                                                                                                  Except resources won't be plentiful for a long while since AI is only impacting the service sector. You can't eat a service, you can't live in one. SAAS will get very cheap though...

                                                                                                                                                                                                                                                                                                    • falcor84

                                                                                                                                                                                                                                                                                                      today at 10:47 PM

                                                                                                                                                                                                                                                                                                      Robotics has been advancing very quickly recently. If we solve long-term AI action planning, I don't see any limitation to making it embodied.

                                                                                                                                                                                                                                                                                      • mofeien

                                                                                                                                                                                                                                                                                        today at 8:19 PM

                                                                                                                                                                                                                                                                                        Obviously a human in the loop is always needed and this technology that is specifically trained to excel at all cognitive tasks that humans are capable of will lead to infinite new jobs being created. /s

                                                                                                                                                                                                                                                                                    • today at 7:23 PM

                                                                                                                                                                                                                                                                                        • today at 7:27 PM

                                                                                                                                                                                                                                                                                      • chvid

                                                                                                                                                                                                                                                                                        today at 7:43 PM

                                                                                                                                                                                                                                                                                        100.000 lines of code for something that is literally a text book task?

                                                                                                                                                                                                                                                                                        I guess if it only created 1.000 lines it would be easy to see where those lines came from.

                                                                                                                                                                                                                                                                                          • falcor84

                                                                                                                                                                                                                                                                                            today at 7:56 PM

                                                                                                                                                                                                                                                                                            > literally a text book task

                                                                                                                                                                                                                                                                                            Generating a 99% compliant C compiler is not a textbook task in any university I've ever heard of. There's a vast difference between a toy compiler and one that can actually compile Linux and Doom.

                                                                                                                                                                                                                                                                                            From a bit of research now, there are only three other compilers that can compile an unmodified Linux kernel: GCC, Clang/LLVM and Intel's oneAPI. I can't find any other compiler implementation that came close.

                                                                                                                                                                                                                                                                                              • cv5005

                                                                                                                                                                                                                                                                                                today at 8:18 PM

                                                                                                                                                                                                                                                                                                That's because you need to implement a bunch of gcc-specific behavior that linux relies on. A 100% standards compliant c23 compiler can't compile linux.

                                                                                                                                                                                                                                                                                            • blibble

                                                                                                                                                                                                                                                                                              today at 10:49 PM

                                                                                                                                                                                                                                                                                              indeed

                                                                                                                                                                                                                                                                                              a working C compiler is literally in my "teach yourself C in 24 hours" book from 30 years ago

                                                                                                                                                                                                                                                                                              • anematode

                                                                                                                                                                                                                                                                                                today at 7:52 PM

                                                                                                                                                                                                                                                                                                A simple C89 compiler is a textbook task; a GCC-compatible compiler targeting multiple architectures that can pass 99% of the GCC torture test suite is absolutely not.

                                                                                                                                                                                                                                                                                                • wmf

                                                                                                                                                                                                                                                                                                  today at 7:54 PM

                                                                                                                                                                                                                                                                                                  This has multiple backends and a long tail of C extensions that are not in the textbook.

                                                                                                                                                                                                                                                                                              • fxtentacle

                                                                                                                                                                                                                                                                                                today at 7:57 PM

                                                                                                                                                                                                                                                                                                You could hire a reasonably skilled dev in India for a week for $1k —- or you could pay $20k in LLM tokens, spend 2 hours writing essays to explain what you want, and then get a buggy mess.

                                                                                                                                                                                                                                                                                                  • Philpax

                                                                                                                                                                                                                                                                                                    today at 8:03 PM

                                                                                                                                                                                                                                                                                                    No human developer, not even Fabrice Bellard, could reproduce this specific result in a week. A subset of it, sure, but not everything this does.

                                                                                                                                                                                                                                                                                                      • falloutx

                                                                                                                                                                                                                                                                                                        today at 8:10 PM

                                                                                                                                                                                                                                                                                                        just forked https://github.com/Vexu/arocc and it took me 5 seconds to complete it.

                                                                                                                                                                                                                                                                                                          • defen

                                                                                                                                                                                                                                                                                                            today at 10:36 PM

                                                                                                                                                                                                                                                                                                            That can't build the Linux kernel though.