\

FP8 is ~100 tflops faster when the kernel name has "cutlass" in it

243 points - 07/11/2025

Source
  • hvenev

    07/11/2025

    In `libnvidia-nvvm.so` the string `cutlass` appears right after `Memory Dependence Analysis` and `memdep`. Perhaps it acts as an optimization attribute of some sort, where the compiler is allowed to make assumptions about the kernel's behavior that are not valid in general?

      • jdright

        07/11/2025

        yes, that is a very usual way (known practices) of vendors applying specific optimizations for known things.

        It is also part of the benchmarks game they play against each other.

          • MBCook

            07/11/2025

            The link is long dead and the Wayback machine doesn’t have a copy.

            But in 2001 ATI was caught applying optimizations to Quake 3 when someone realized if you renamed the executable from “quake” to “quack” the score dropped a ton. It was a big scandal.

            I know that’s common now but that wasn’t a thing that was done at the time.

              • atomicnumber3

                07/11/2025

                Was it a scandal at the time? My understanding of how per-game card-driver optimizations work today is:

                1. AAAA Game Studio shits out another unoptimized clunker

                2. nvidia considers it a reputational risk if games run at 30 FPS on a 5090

                3. They go in, look at the perverse ways the game misuses rendering primitives, and then hacks shit in to make whatever bad things they're doing less bad.

                As a gamer, this seems fine to me and i generally blame the AAAA devs for being bad at their jobs or AAAA studio leads for being ok shipping unoptimized messes.

                  • antonvs

                    07/11/2025

                    > As a gamer, this seems fine to me

                    As a software developer, it almost certainly has a bad effect on the ecosystem long term. "Hacks shit in" is the very definition of technical debt, and that has a cost that someone, somewhere is going to have to pay in some form.

                      • RHSeeger

                        07/11/2025

                        I can't reply to the person that replied to you, so

                        > You’re looking as a dev, but the reality is that a consumer cannot see technical debt.

                        The consumer can't _see_ technical debt, but they sure as heck can be impacted by it.

                        - Technical debt means the code base is harder to work with later. So fixes/enhancements take longer to make it into the code (and sometimes never can)

                        - This particular type of technical debt means the code by the game developers sets precedent, and the next developer may us it as an example. So the amount of code incorrectly using the api grows faster over time

                          • strbean

                            07/11/2025

                            For some reason HN sometimes hides the reply button on leaf comments. I think this only happens for very new comments.

                            You can click the timestamp ("X minutes ago") to view the comment without context, and reply from there.

                              • monocasa

                                07/11/2025

                                I think it's a anti flamewar tactic to put the brakes on quick replies.

                            • charcircuit

                              07/11/2025

                              >the next developer may us it as an example

                              These hacks are game specific, so another developer wouldn't get them.

                                • RHSeeger

                                  07/11/2025

                                  The way the API was used incorrectly "worked", and the game didn't see the negative impact of it because it was "fixed away". And then the incorrect usage is used again on another game and doesn't get the "fixed away" benefit. And the same incorrect usage could happen over and over because "it works".

                                  • lsaferite

                                    07/11/2025

                                    The next developer at that company that uses or references the crappy code for another project would still have the issue, but not get the benefit of the down-stream GPU vendor hacks to fix the buggy game.

                            • 07/11/2025

                              • cyanydeez

                                07/11/2025

                                Does anyone talk about how technical debt often just gets thrown into the garbage so we can buy fancy new technical crap, and its what pays for most of yalls jobs.

                                • SideQuark

                                  07/11/2025

                                  > technical debt, and that has a cost that someone, somewhere is going to have to pay in some form

                                  There is no reason anyone has to pay each and every iota of technical debt. Plenty of things with technical debt hit end of life and no one ever looks in that code again. I suspect most technical debt goes this way - in program, program never updates (or minor updates), then dies.

                                  Your claim would require every piece of technical debt in anything ever (code, buildings, cars, anywhere) has to be removed before the thing goes end of life or goes into a mode where it never is changed. That seems ludicrous to me.

                                  • 07/11/2025

                                    • monkpit

                                      07/11/2025

                                      You’re looking as a dev, but the reality is that a consumer cannot see technical debt. If the studio churns out a game, the vendor sprinkles on some optimizations, people play it and move on, then the tech debt just vaporizes into the void. It’s not real at that point.

                                        • wtetzner

                                          07/11/2025

                                          Just because a consumer can't see technical debt doesn't mean they aren't paying for it. Most game studios continue to re-use code, so it doesn't just "vaporize" into the void.

                                            • 8n4vidtmkvmk

                                              07/11/2025

                                              I'm pretty sure I pay this debt with lost FPS and every time I glitch through the floor into the nether.

                                  • btbuilder

                                    07/11/2025

                                    I believe the driver silently swapped the textures to lower quality ones that looked worse but gave a performance boost.

                                    • toast0

                                      07/11/2025

                                      > Was it a scandal at the time?

                                      Yes. My understanding was it was optimized by reducing precision or something to a visibly apparent degree.

                                      It's different if the driver changes things in ways such that rendered output is the same or at least imperceptibly different. I think there's also a lot more communication between gpu makers and game/engine developers these days; plus a lot more frequent updates.

                                        • KronisLV

                                          07/11/2025

                                          > My understanding was it was optimized by reducing precision or something to a visibly apparent degree.

                                          If only we had that sort of a control over rendering for every game ourselves - since projects like OptiScaler at least let us claw back control over sometimes proprietary upscaling and even framegen, but it's not quite enough: https://github.com/optiscaler/OptiScaler

                                          I'd also mention Lossless Scaling here, though it still only works on upscaling and framegen and with worse methods, but at least works for most games out there: https://store.steampowered.com/app/993090/Lossless_Scaling/

                                          I want to be able to freely toggle between different types of AA and SSAO and reflections and lighting and LOD systems and various shader effects (especially things like chromatic aberration or motion blur) and ray tracing and all that, instead of having to hope that the console port that's offered to me has those abilities in the graphics menu and that whoever is making the decisions hasn't decided that actually "low" graphics (that would at least run smoothly) would look too bad for the game's brand image or something.

                                      • mcculley

                                        07/11/2025

                                        I was surprised to see “AAAA”. I didn’t know there were 4 As now.

                                        “AAAA Game Studio shits out another unoptimized clunker” seems a paradoxical statement to me. I would have thought “AAAA” meant “highly resourced” game company. Does it just mean high revenue? Lots of players?

                                          • bigfishrunning

                                            07/11/2025

                                            AAAA isn't a real thing, it's a memey joke based on a press release by a microsoft studio that was closed before ever releasing a single game

                                            • wtetzner

                                              07/11/2025

                                              AAA/AAAA just means "how much money was spent developing the game". High cost doesn't automatically equal high quality. In fact, it seems after a certain point to mean the opposite.

                                              • vinceguidry

                                                07/11/2025

                                                The more money you throw at an effort, the more gets flushed out as waste, and the harder it is to maintain quality. Pretty universal across business.

                                                • mwpmaybe

                                                  07/11/2025

                                                  High price...

                                              • bayindirh

                                                07/12/2025

                                                A friend of mine developed his own game engine, and what he said is you need to bargain with the nVidia driver, because hardware doesn't perform at its peak when you write everything honoring the spec, and driver feels free to ignore your commands about how you want to do some things (e.g. memory transfers).

                                                Like board manufacturers, the game developers also need to please the drivers and do the way driver silently dictates to them (regardless of what DirectX, OpenGL or Vulkan says), otherwise all bets are off.

                                                • itsTyrion

                                                  07/11/2025

                                                  it rendered in lower quality, IIRC lower textures / much more aggressive mipmapping and/or LOD

                                                  • gmueckl

                                                    07/11/2025

                                                    Except that if a developer has that kind of market pull, nVidida will gladly help those devs with getting it right. They are excellent at maintaining developer relations.

                                                • IAmBroom

                                                  07/11/2025

                                                  In at least one past version of Windows (circa 1990s), if you tried to replace the default web browser of IE with another choice you were given an Open File dialog window to choose the executable.

                                                  Funny quirk, though: that particular window wouldn't show files named firefox.exe. It would accept that as typed input, if you were at the correct folder, but the file listing omitted that particular file.

                                                  Maybe it was mozilla.exe; it was a long time ago. But that was the discovery that pushed me off IE forever.

                                                    • lstamour

                                                      07/11/2025

                                                      I vaguely remember that being the start of the browser prompts to set your current browser as the default. It was so hard to just configure that they had to build a way to set it within the browser.

                                                      You saw that again in more modern times when Microsoft removed support for the APIs they provided to set browser defaults, forcing browser makers to write step by step instructions on what to click to set the default browser.

                                                      I believe they walked that back, but it left such a bad taste that I switched my installation of Windows from default mode to EU mode in order to avoid it. And come to think of it, I haven’t used my windows machine for much outside of AI in about 6 months.

                                                      But Microsoft is not alone in these sort of defaults games - every OS or browser maker, Apple, Google, Firefox, wants to create moats so they can more easily monetize your usage of a product. I never thought I’d prefer the business model of free to play games, where they just outright ask you for money and have to keep finding new ways to entertain instead of relying on hard to change defaults and selling your data.

                                                        • charcircuit

                                                          07/11/2025

                                                          An app being able to see itself as the default browser sounds like such a dangerous API, especially if it can be done silently without the user realizing it.

                                                  • hinkley

                                                    07/11/2025

                                                    There are bugs that certain games rely on and features that some don’t use. I’m currently trying to optimize a library out of spite. (I want it to work better than the competitor that caused me a lot of problems on a recent project). The amount of conditional logic around what is essentially a function to increment a value is breathtaking.

                                                      • gatlin

                                                        07/11/2025

                                                        Do you have any kind of example you're able to share? I don't mean to take your IP but I want to see this breathtaking vista.

                                                          • amiga386

                                                            07/11/2025

                                                            A simple example would be that the function glGetString(GL_EXTENSIONS) crashes the original Quake engine and many licensees, because it's expecting no more than a 256 character string.

                                                            The driver looks to see if a known old game is calling it, and if it's one known to crash, it returns no more than 256 characters, and likely also puts all the _old_ extensions that the game is likely to know and react to in the string.

                                                            There are also all sorts of games that called APIs in a particular order or set particular options, because they represented a "fast path" at the time, and now they don't, but if you're that program, then yes they do.

                                                            Ultimately, this clutter is what let do the development of the Vulcan API, to stop games second-guessing graphics APIs which themselves second-guess the games.

                                                            • hinkley

                                                              07/11/2025

                                                              To avoid doxxing myself: In a deep call stack it’s possible to end up sanitizing inputs multiple times and in different ways.

                                                              A frequent example I’ve encountered is web frameworks that have to keep checking for escaped text because they didn’t write it in horizontal layers where you know for sure that all inputs have been scrubbed when they reach this function but not that one. So the same functions get called with data that comes from your team and from customers. Reuse is tricky.

                                                                • hamburglar

                                                                  07/11/2025

                                                                  “Checking for escaped text” is the sort of nonsense that tells you you’re dealing with amateur developers.

                                                                    • withinboredom

                                                                      07/11/2025

                                                                      Indeed. The rules are simple:

                                                                      - Unescape, sanitize or validate at all entry points.

                                                                      - Escape all outputs (this includes the database queries).

                                                                      If you follow those simple rules, you never have to check once you are past a controller. And you should fuzz your controllers to make sure no unexpected data makes it past there.

                                                                        • hinkley

                                                                          07/12/2025

                                                                          Thing about taking a job is they don’t generally let you look at the code first and nope out if it’s fucked six ways to Sunday.

                                                                          Everyone has clever answers for greenfield projects and empty rhetoric for brown.

                                                      • bosco_mcnasty

                                                        07/12/2025

                                                        the etiology of this hack is pretty obvious with a simple google search:

                                                        https://docs.nvidia.com/cutlass/index.html

                                                        it presumably makes various assumptions and speedups for NVIDIA's matrix multiplication library... called cutlass

                                                    • MichaelZuo

                                                      07/11/2025

                                                      It’s really strange for established companies to waste their credibility on games like that…

                                                        • MangoToupe

                                                          07/11/2025

                                                          I was pretty young at the time, but I recall the market for graphics being a lot wider open at the time Quake was released. Remember 3dfx? They produced the Voodoo series of graphics cards. They're barely a distant memory now.

                                                          Quake was also the standard for a game that was willing to fully exploit the hardware of the time.

                                                          • IAmBroom

                                                            07/11/2025

                                                            Never underestimate how much human ego will control actions.

                                                    • high_na_euv

                                                      07/11/2025

                                                      Thats very likely imo

                                                  • KomoD

                                                    07/11/2025

                                                    actual link: https://github.com/triton-lang/triton/pull/7298

                                                      • bede

                                                        07/11/2025

                                                        Thank you, perhaps the parent can be edited to use this URL instead

                                                    • PLenz

                                                      07/11/2025

                                                      The Volkswagon emissions testing model

                                                        • fambalamboni

                                                          07/11/2025

                                                          [dead]

                                                      • spoaceman7777

                                                        07/11/2025

                                                        Seems this is likely due to ongoing work on FP8 support on nvidia/cutlass. From my reading, the alternative code path was likely added recently for testing by external contributors to the cutlass project, and other involved parties. (Rather than attempting to distribute custom packaged internal builds of cuda.)

                                                        This ticket is a good starting place to see the chain of issues around the ongoing work: https://github.com/NVIDIA/cutlass/pull/2037

                                                          • Xss3

                                                            07/12/2025

                                                            The real answer

                                                        • tempaway43563

                                                          07/11/2025

                                                          So, what is Cutlass, can someone explain whether checking for kernel names makes sense here or is a form of cheating?

                                                          https://docs.nvidia.com/cutlass/index.html

                                                            • gpm

                                                              07/11/2025

                                                              Github version: https://github.com/NVIDIA/cutlass

                                                              I wonder if we search the comments if we can find something referencing this.

                                                              • rurban

                                                                07/11/2025

                                                                That's strange because the cutlass docs explicitly does NOT mention fp8 support. So it looks like it can be used nevertheless with fp8 by using the name hack.

                                                                  • mlazos

                                                                    07/11/2025

                                                                    It supports e5m2 and e4m3 right in the doc linked.

                                                            • high_na_euv

                                                              07/11/2025

                                                              I have small experience with compilers and llvm but youd be shocked how many things rely on names and parsing names

                                                              If you have hundreds of passes that are complex and rely on various "contracts" like type names or some shit, then really crazy things like this can happen unintentionally and not maliciously

                                                                • diggan

                                                                  07/11/2025

                                                                  Web-developers are well aware of this too. Sincerely, Mozilla/5.0 (X11; Linux x86_64; rv:139.0) Gecko/20100101 Firefox/139.0

                                                                    • bravesoul2

                                                                      07/11/2025

                                                                      Funny we send a browser wars tombstone in every request!

                                                                        • antonvs

                                                                          07/11/2025

                                                                          Let's have a moment of silence for Gecko/20100101

                                                                  • the8472

                                                                    07/11/2025

                                                                    Some names are standardized items, like memcpy. Matching those is ok, nothing sneaky going on there. Matching something vendor-specific in a general-purpose API is different story.

                                                                    • halJordan

                                                                      07/11/2025

                                                                      Why would i be shocked that a name is informative. Like... are you surprised that wrought iron is wrought? Or cast iron is made from a cast?

                                                                        • IAmBroom

                                                                          07/11/2025

                                                                          Dog piles are often neither composed of dogs, nor actual piles.

                                                                          Names can be both informative, and misdirecting, at the same time.

                                                                  • orlp

                                                                    07/11/2025

                                                                    GenuineIntel moment.

                                                                      • reitzensteinm

                                                                        07/11/2025

                                                                        Or maybe Quack III: Arena. https://m.slashdot.org/story/21054

                                                                          • bayindirh

                                                                            07/11/2025

                                                                            Ooh, I remember this, but actually the thing is older than it.

                                                                            First, nVidia and ATI used executable names for detecting games, then they started to add heuristics.

                                                                            If you think they stopped the practice, you're very mistaken. Every AMD and nVidia driver has game and app specific fixes and optimizations.

                                                                            nVidia cheated in 3D Mark that way, so they patched/changed their benchmark to prevent it. Also, again nVidia, patched their drivers so some of the more expensive but visually invisible calls like scene flushes in a particular game is batched (e.g. do all 50 flushes at the 50th call) to prevent the game becoming a slide show on expensive hardware.

                                                                            This is also why AMDs and Intel's open source drivers under Linux a success, because they are vanilla drivers written from scratch per spec, and if your code calls OpenGL/Vulkan to spec, then you're golden.

                                                                            Even some companies cross compile AMD's Linux drivers for windows on embedded systems since they're free from useless optimizations from them.

                                                                            • dahauns

                                                                              07/11/2025

                                                                              Aah, that brings back memories...

                                                                              Interestingly, most benchmark controversies back in the day are now expected behaviour, i.e. game-specific optimizations with no (well, in this age of upscalers and other lossy optimization techniques, probably even somewhat) visible image degradation. A gaming-specific driver with no game-specific improvements in its changelog would be considered strange, and it very much works with executable detection.

                                                                              Back in the day, there was still the argument that drivers should not optimize for benchmarks even when visually identical, because it wouldn't show the hardware's real world potential. Kinda cute from today's perspective. :)

                                                                              But of course there were the obvious cases...

                                                                              The Quack3 lowering filtering quality as shown above, of course (at least that one was put into the driver as a togglable setting later on).

                                                                              But the most cheeky one has to be nVidia's 3dmark03 "optimizations", where they blatantly put static clip planes into the scenes so that everything outside the predefined camera path from the benchmark sequence would simply be cut from the scene early (which e.g. fully broke the freelook patched into 3dmark and would generally break any interactive application)

                                                                                • bayindirh

                                                                                  07/11/2025

                                                                                  You beat me to it. Grrr...

                                                                                  Just kidding, nice to see another person who remembers these things. Want some root beer?

                                                                              • BoredPositron

                                                                                07/11/2025

                                                                                Now I want a Quake shooter but with ducks.

                                                                                  • carlos22

                                                                                    07/11/2025

                                                                                    Not ducks, but chickens, was very popular in Germany back in the day: https://en.wikipedia.org/wiki/Crazy_Chicken

                                                                                      • avhception

                                                                                        07/11/2025

                                                                                        Oh wow, that was a blast from the past. The Moorhuhn craze!

                                                                                        Many people, including me, didn't have an internet connection back in the day. The Sneakernet went into overdrive so get everyone a copy!

                                                                                    • supportengineer

                                                                                      07/11/2025

                                                                                      A Duck Hunt, if you will…

                                                                                  • iforgotpassword

                                                                                    07/11/2025

                                                                                    I think that was the first case (to go public), but I remember reading about this in game magazines a couple times after this, for both ATI and nvidia.

                                                                                    • 07/11/2025

                                                                                  • hofrogs

                                                                                    07/11/2025

                                                                                    I'm interested in that story, what are you referring to with "GenuineIntel"?

                                                                                      • orlp

                                                                                        07/11/2025

                                                                                        Intel's C++ compiler is known to add branches in its generated code checking if the CPU is "GenuineIntel" and if not use a worse routine: https://en.wikipedia.org/wiki/Intel_C%2B%2B_Compiler#Support....

                                                                                          • bayindirh

                                                                                            07/11/2025

                                                                                            Even in the middle of that turmoil, we managed to compile some code with Intel's ICC and make it go faster on AMD Opterons, breaking Intel's own numbers.

                                                                                            When my colleague said that they managed to go faster than intel with icc with some hand tuned parameters, I remember answering "youdidwat?".

                                                                                            Good times.

                                                                                            • microtonal

                                                                                              07/11/2025

                                                                                              Also MKL:

                                                                                              https://danieldk.eu/Intel-MKL-on-AMD-Zen

                                                                                              • pieterbreed

                                                                                                07/11/2025

                                                                                                Is this for the runtime of the compiled code or for the compiling machine? Do they generate slow code if the compiler is running on non-intel?

                                                                                                  • Uvix

                                                                                                    07/11/2025

                                                                                                    Runtime of the compiled code. The ostensible intent is so that new processors can use new features like SIMD, while offering a fallback for older ones. In practice, they’re detecting an Intel processor, not just the specific feature.

                                                                                                    • kstrauser

                                                                                                      07/11/2025

                                                                                                      For the compiled code. Its output deliberately runs slower on non-Intel CPUs.

                                                                                                      • SSLy

                                                                                                        07/11/2025

                                                                                                        the runtime. patching cpuid makes the code go faster

                                                                                            • _zoltan_

                                                                                              07/11/2025

                                                                                              [flagged]

                                                                                          • koakuma-chan

                                                                                            07/11/2025

                                                                                            is 100 tflops a lot?

                                                                                              • saagarjha

                                                                                                07/11/2025

                                                                                                It's like 5-10% here

                                                                                                  • irrelative

                                                                                                    07/11/2025

                                                                                                    Correct, this is the actual headline too. 100 tflops sure seems like it'd be more than that, but here we are.

                                                                                                    If the headline was "FB8 is ~7% faster when kernel name has 'cutlass' in it...", it wouldn't seem sensational.

                                                                                                      • saagarjha

                                                                                                        07/11/2025

                                                                                                        I think the interesting part is that it improves performance measurably at all, not the actual number. These people are trying to hit 90+% MFU (though most don't reach it) so this does actually translate to many millions of dollars for them.

                                                                                                • progx

                                                                                                  07/11/2025

                                                                                                  5060 ti +~15%

                                                                                                  • brightmood

                                                                                                    07/11/2025

                                                                                                    yea

                                                                                                    • HideousKojima

                                                                                                      07/11/2025

                                                                                                      According to Terminator 3 Skynet used a mere 60 TFLOPS

                                                                                                        • IAmBroom

                                                                                                          07/11/2025

                                                                                                          How much is that in jiggawatts per parsec?

                                                                                                  • rowanG077

                                                                                                    07/11/2025

                                                                                                    Let's hope for Nvidia this is an innocent optimization only valid for internal kernels that cannot be applied in general.

                                                                                                      • jagrsw

                                                                                                        07/11/2025

                                                                                                        In which case checking for a string inside arbitrary name is sloppy (a bug).

                                                                                                    • Arch-TK

                                                                                                      07/11/2025

                                                                                                      I wish people either learned how to use git or just wholesale stopped using it.

                                                                                                        • leoh

                                                                                                          07/11/2025

                                                                                                          Context?

                                                                                                            • Arch-TK

                                                                                                              07/12/2025

                                                                                                              The PR?

                                                                                                      • giingyui

                                                                                                        07/11/2025

                                                                                                        And what’s the downside of using that kernel name? It can’t just be that it’s faster and nothing else. Unless they included lots of sleep(x) calls.

                                                                                                          • samus

                                                                                                            07/11/2025

                                                                                                            There might be optimizations that are only safe for the code that this was an intender for.

                                                                                                              • bialpio

                                                                                                                07/11/2025

                                                                                                                Seems like a bad idea to rely on a name for deciding this then, unless it's documented somewhere that using names containing certain substrings may trigger unsafe optimizations...

                                                                                                        • arzookanak

                                                                                                          07/11/2025

                                                                                                          [dead]

                                                                                                          • nolok

                                                                                                            07/11/2025

                                                                                                            Intel's quest to move from "trusted by default / the reference" to "check for scam" is getting worse every release. And it's 100% self inflicted. How weird.

                                                                                                              • aleph_minus_one

                                                                                                                07/11/2025

                                                                                                                In my understanding of the PR, it rather seems that it is NVidia is the company that is cheating. :-)

                                                                                                                • pkhuong

                                                                                                                  07/11/2025

                                                                                                                  NVIDIA-inflicted in this case.

                                                                                                              • zahlman

                                                                                                                07/11/2025

                                                                                                                This tweet appears to be taking the original material out of context to misrepresent it:

                                                                                                                > Rewrite the attention kernel to be persistent. This gives better performance at low-contexts. However, fp16 at large context has suffered a bit due to a ptxas instruction scheduling issue in the softmax partition. fp8 is ~100 tflops faster when the kernel name has "cutlass" in it.

                                                                                                                The charitable reading is that, on certain kernels, using fp8 rather than fp16 values gives better performance. (Although I can't even see how the numbers relate to a "~100 tflops faster" claim in any respect, nor does it even list any kernel names or suggest a control kernel!) But this is being presented as if someone has uncovered evidence of cheating on benchmarks.

                                                                                                                  • zettabomb

                                                                                                                    07/11/2025

                                                                                                                    No, that sentence is separate from the rest. Take a look at the pull request:

                                                                                                                        # Up to 150 TFLOPS faster for fp8!
                                                                                                                        if specialization.constants["dtype"] == gl.float8e5:
                                                                                                                            name = "cutlass_" + name

                                                                                                                      • zahlman

                                                                                                                        07/11/2025

                                                                                                                        The tweet is quoting from the first message in the "conversation" on the PR. There are 93 commits in the PR and GitHub doesn't even default to that tab. I looked at the obvious text and drew the conclusion that was obvious to me.

                                                                                                                    • imtringued

                                                                                                                      07/11/2025

                                                                                                                      https://github.com/triton-lang/triton/pull/7298/commits/a5e2...

                                                                                                                      It's literally in the code.

                                                                                                                        • zahlman

                                                                                                                          07/11/2025

                                                                                                                          I already had to deal with Twitter and a link shortening service just to get to GitHub and then it still only pointed to the facing page of a 93-commit PR.

                                                                                                                      • saagarjha

                                                                                                                        07/11/2025

                                                                                                                        I think you're the one doing that to the tweet, actually.

                                                                                                                          • zahlman

                                                                                                                            07/11/2025

                                                                                                                            What are you talking about? When I view the tweet, the only text I see is:

                                                                                                                            > > fp8 is 100 tflops faster when the kernel name has "cutlass" in it

                                                                                                                            > kms

                                                                                                                              • saagarjha

                                                                                                                                07/11/2025

                                                                                                                                And it includes a link to show that this is the context it came from.

                                                                                                                                  • zahlman

                                                                                                                                    07/11/2025

                                                                                                                                    And when I look at the link, the part I quoted is the relevant text I see.

                                                                                                                                    In order to get to the part that you're trying to hold me accountable for, I would furthermore have to click onto the commits tab and search through a 93-commit PR.

                                                                                                                                    I thought today I was using a site where trying to think the best of people and propose that someone had taken something out of context, based on the immediately available context having a simpler explanation, would not get me treated like a corporate shill (for a company I don't even care about). Apparently I was wrong.

                                                                                                                                      • saagarjha

                                                                                                                                        07/11/2025

                                                                                                                                        I don't think you are a corporate shill. I do think that you immediately going "clearly the tweet is wrong" without doing any research whatsoever was unwarranted, though. You also keep bringing up that it's 93 commits but all getting squashed you have to do is search for "cutlass" to find out what is going on. I think you're obligated to do at least that when you call it out for being wrong.

                                                                                                                                          • zahlman

                                                                                                                                            07/11/2025

                                                                                                                                            How did you get "clearly" out of "appears to be"?

                                                                                                                                            How did you get "without doing any research whatsoever" out of me demonstrably following the link and reading and quoting what appeared on the facing page?

                                                                                                                                              • saagarjha

                                                                                                                                                07/13/2025

                                                                                                                                                I think if you spent the amount of effort you’ve used to reply to me here to skim through the PR you would’ve changed your mind. I understand that the code requires some domain knowledge to fully understand but I think even a cursory skim would be enough to disabuse me of the idea that the tweet was cherry-picking.