My favorite part of the paper is that the āattackā isnāt just exploiting a bug ā itās exploiting how different components interpret the same input. Modifying an executable as itās loaded into memory is one example, but the deeper pattern is the mismatch.
Whatās interesting about the malware in this post is that it goes one step further: instead of exploiting mismatches, it corrupts the computation itself ā so every infected system agrees on the same wrong answer!
More broadly: any interpretive mismatch between components creates a failure surface. Sometimes it shows up as a bug, sometimes as an exploit primitive, sometimes as a testing blind spot. You see it everywhere ā this paper, IDS vs OS, proxies vs backends, test vs prod, and now LLMs vs āguardrails.ā
Fun HN moment for me: as I was about to post this, I noticed a reply from @tptacek himself. His 1998 paper with Newsham (IDS vs OS mismatches) was my first exposure to this idea ā and in hindsight it nudged me toward infosec, the Atlanta scene, spam filtering (PG's bayesian stuff) and eventually YC.
https://users.ece.cmu.edu/~adrian/731-sp04/readings/Ptacek-N...
The paper starts with this Einstein quote "Not everything that is counted counts and not everything that counts can be counted", which seems quite apt for the malware analyzed here :)
Just curious, are you purposely mocking the LLM writing style?
Thatās how everybody in academia, tech, and published authors in general used to write.
Where do you think the LLM is getting it from? ^_^