\

Jank now has its own custom IR

136 points - last Friday at 5:17 PM

Source
  • pjmlp

    today at 7:32 AM

    The natural evolution of compiler toolchains that live long enough on top of LLVM, eventually every one matures into having their own IR.

    Even clang is now in the process of doing the same.

    > We're going to use Clojure JVM to get our baseline benchmark numbers and then we'll aim to beat those numbers with jank.

    > Note that all numbers in this post are measured on my five year old x86_64 desktop with an AMD Ryzen Threadripper 2950X on NixOS with OpenJDK 21. When I say "JVM" in this post, I mean OpenJDK 21.

    In 2026, a better baseline would be the Java 26 implementations of OpenJDK, OpenJ9, and GraalVM, with JIT cache across several execution runs.

    > In the native world, we don't currently have JIT optimization. It could exist, but LLVM doesn't have any implementation for it and neither does any major C or C++ compiler

    Yes they kind of have, that is partially what PGO is used for, to get the program behaviour during training runs, and feed it back into the compilation toolchain.

    Also while it isn't native code per se, when targeting bytecode environments like IBM i, WebAssembly, CLR, among others, with C or C++, there is certainly the possibility of having a JIT in the picture.

    > Finally, just because jank is written in C++ doesn't mean that we can escape Clojure's semantics. Clojure is dynamically typed, garbage collected, and polymorphic as all get out.

    Which is why, benchmarks should also take into account compilers for Common Lisp and Scheme compilers.

    Anyway, great piece of work, and it was a very interesting post to read, best wishes to the author finding some support.

    • christophilus

      today at 12:06 AM

      > we're using it to optimize jank to compete with the JVM

      The JVM gets a lot of hate, but that is a very high bar. The JVM is a serious piece of kit. I hope Jank succeeds. I'd love to use it in real projects.

        • pjmlp

          today at 7:30 AM

          Additionally, there are many JVMs to chose from, many always make the mistake to equate JVM with OpenJDK, which is like talking about C and only considering GCC or something.

          Other JVMs have plenty of goodies, some of them have AOT for about 20 years now, others real time GC, other ones JIT caches before Project Leyden was even an idea, others actual value types as experiment (ObjectLayout on Azul), pauseless GC, cloud based JIT compilers, bare metal deployments, ART also has its goodies somehow despite everything, there is a whole world that is lost when people focus too much on JVM == OpenJDK.

            • let_rec

              today at 8:41 AM

              On the other hand, the JVM spec may prohibit some optimizations you are after. It's very dynamic after all!

                • pjmlp

                  today at 8:51 AM

                  Not really, that is the usual argument why CPython is slow.

                  If anything runtimes like the various JVM implementations, alongside the CLR and JS engines as well, are the bleeding edge of dynamic compiler optimizations with dynamic runtimes.

                  That is something that gets lost when talking about Java, yes the programming language looks like C++, however the JVM itself is heavily inspired by Smalltalk and Objective-C dynamic semantics.

                  Coming back to the spec, you will notice that it doesn't mention how threads are implemented, what kind of AOT/JIT are available, or what GC algorithms to implement, leaving enough room space for implementations.

                  One area where you are actually right, that I just remembered while typing this, are the way reflection or unsafe code hinders some optimizations, hence the ongoing steps that enabling JNI or FFM has to be explicit at startup, dynamic agents also have to be expliclity enabled, and the upcoming final means final (no more changing final fields via reflection).

                    • truth_seeker

                      today at 9:07 AM

                      what really matters is :

                      how far can i get in X programming language by writing just idiomatic code?

                      how much of SDK and community libs, frameworks help me run my program at bare metal speed ?

                      What sort of change i have to do exisitng libs, frameworks and my legacy code for CPU, IO and memory efficiency as a migrate to new version ?

                        • pjmlp

                          today at 9:22 AM

                          That is only part of the picture, the other part that seems quite forgotten nowadays is:

                          - how much people actually care about algorithms and data structures

                          - do they actually know what options their tools have available

                          - have they ever spend at least an hour reading the man pages, info page or HTML documentations

                          - have they ever used a profiler, a graphical debugger, an advanced IDE

      • lemming

        today at 12:20 AM

        Great article, as always.

        There is one thing that I think is important to bear in mind when discussing inlining, especially in the context of Clojure. This is that once a function has been inlined, you can no longer update the definition of that function in the REPL and have that update the behaviour of functions which use it, unless you recompile those as well. This is not a criticism of course, it’s just part of the natural tension between dynamism and performance.

          • thfuran

            today at 1:22 AM

            Does that not happen automatically? I know there are contexts in which jvm will deoptimize inlining and recompile, like in response to class loading that causes a call site that was previously provably monomorphic to no longer be.

              • lemming

                today at 3:30 AM

                No, it doesn't. In JVM Clojure's case, the vars are usually compiled to the moral equivalent of a global variable holding a pointer to a function. This allows you to update the function if the developer redefines it in the REPL, but it comes at a performance cost (the JVM can't inline it or otherwise optimise it). Clojure also allows you to compile with "direct linking", e.g. for production deployments, where you know you're unlikely to be wanting to dynamically update the code. In those cases defns are compiled down to static methods which call each other - much faster since the JVM can perform its magic with them, but you can't update them at the REPL.

                I'm unsure exactly how jank works WRT this tradeoff, but the article makes it sound like it's closer to the direct linking version, but with the inlining etc being done by jank rather than the JVM. I don't know if this is only for AOT or also in JIT cases.

            • sieabahlpark

              today at 12:54 AM

              [dead]

          • CalChris

            today at 2:30 AM

            The natural question is why doesn't Jank use MLIR?

              • pjmlp

                today at 7:31 AM

                No language using MLIR uses it directly out of the box, in that sense the right question is why did Jank not create their MLIR dialect.

            • mccoyb

              today at 12:09 AM

              Hoping to understand this better:

              > Clojure's dynamism is granted by a great deal of both polymorphism and indirection, but this means LLVM has very few optimization opportunities when it's dealing with the LLVM IR from jank.

              In my mind, what is happening here is you lower Clojure code into LLVM, with a bunch of runtime calls (e.g. your `jank::runtime::dynamic_call`) (e.g. LLVM invoking the runtime over a C ABI).

              If that's true, are there any optimizations that LLVM helps out with? Perhaps like DCE? I can't tell immediately, curious about the answer

              (question is obviously about the pre-IR state of things)

                • codebje

                  today at 2:36 AM

                  The article talks about inlining a two-arity call to clojure.core/max to instead be an explicit call to cpp/jank.runtime.max, eliminating the unnecessary argument count matching and recursion portions of the Clojure function.

                  It also mentions that in Clang the runtime max function will itself be inlined, so that's something LLVM ("the LLVM project", anyway) is still doing - and beyond that, as written this IR is likely to leave behind plenty of opportunities for LLVM to do the things it's good at: DCE, load/store optimisation, constant propagation, etc. And register allocation.

                  The jank::runtime::max call is itself complex: it's got to type check its arguments and work out what to actually do based on the two types; if parts of these tests are done before the inlined call to max there's a fair chance that LLVM will be able to eliminate their repetition and slim it all down a long way. In the fibonnaci example the fact that a previous test will have likely identified whether the argument is an int or something else should hopefully carry over for ::lte, ::sub, and ::add and simplify those down to just the single operator call - but sadly I suspect it won't at least for the addition, because the recursive call will lose the information that the return value when called with a tagged integer is always a tagged integer.

                  A future optimisation might be to specialise for unboxed types: far more potential speed improvement over pointer tagging, and IMO quite amenable to analysis with the Jank IR (:metadata tag functions as specialised for <type> with the new entry point, if a function only calls specalised functions (and itself) it too can be specialised, and a heuristic to determine if specialisation gains enough to sacrifice space for it).