Tiled Hacker news on React Router

Claude Code is steganographically marking requests

714 points - today at 3:44 PM

Source

meowface
today at 4:49 PM
Value judgment aside: I am a bit surprised at how sloppily they did this. I think they could've achieved the same effect while decreasing the odds of detection via reverse engineering.
(This field is known as "underhanded code", coined by the Underhanded C contest: https://www.underhanded-c.org. It's a little-known "art"; little-known for probably self-explanatory reasons. There are much cleverer ways of achieving objectives like this. One obviously being you can move more out of the client and into the server, but the other being you can write plausibly deniable client code in a much more benign-seeming way than this. Some of what they added can only be done on the client, but I think some could've been moved, and the client-required parts could've been done more subtly and credibly.)
It's possible they knew the JS bundle gets so heavily scrutinized that it'd eventually get spotted and reported on regardless so they didn't bother doing something more subtle and duplicitous. But still seems slightly lazy.
VortexLain
today at 4:31 PM
Codex CLI is FOSS, unlike Claude Code, so Codex is less likely to do things like that, and it's one more reason to avoid Claude Code and Claude in general. Hopefully, many eyes will be looking into Codex for malicious things like that.
croemer
today at 6:48 PM
I was skeptical because this is AI written but Claude Code with Sonnet 5 managed to reproduce it convincingly. Sure I didn't manually verify but it's a lot more trustworthy to have your own agent verify than just trusting a blog.
mrshadowgoose
today at 6:22 PM
The conclusion of this blog post is a bit hysterical. The intent of this steg is excruciatingly clear (identifying usage by Chinese firms that may be conducting model distillation). It's unclear on how this "punishes normal developers" in any shape or form.
matheusmoreira
today at 4:53 PM
I reported a similar system prompt injection mechanism here:
https://news.ycombinator.com/item?id=48259288
https://github.com/anthropics/claude-code/issues/62061
Looks like they just keep finding new "creative" uses for such things, as expected. I'll keep patching them out.
tgsovlerkhgsel
today at 6:47 PM
The question is, what do they do when they see a tagged prompt? Do they flag/ban the account, or serve a degraded response? Are there some well-documented methods of serving a response that is still somewhat useful for what the prompt asks for, but really bad for distillation attempts?
MattDamonSpace
today at 4:04 PM
“So the feature mostly punishes the exact people who are easier to fingerprint: normal developers doing weird but legitimate things”
What’s the punishment here exactly?
edude03
today at 5:26 PM
I don't understand the privacy concerns the author is trying to highlight. Granted, doing anything "sneaky" will always raise suspicious once caught, but on the other hand, there would be no point in implementing these "security features" if they were upfront about how they work.
And no, IMO stenography isn't security by obscurity, in the same that using RSA and keeping the private key private isn't security by obscurity - keeping the private thing private is part of the security model.
sebastiennight
today at 4:26 PM
Can somebody clarify for me - if ANTHROPIC_BASE_URL is set to a different provider... then isn't this "marked" system prompt being sent to that provider's API rather than Anthropic's?
I understand how this can be useful to Anthropic if the 3rd-party is acting as a proxy (because they end up hitting the Claude API with the marked prompt), but it looks like requests where "hostname contains deepseek" would never be sending data to Anthropic. What am I missing?
LPisGood
today at 4:03 PM
This is very interesting. Combating resellers and distillation seems like a very difficult problem indeed. Interesting to me is that these techniques mentioned in the article are just like anti-observation techniques used by some of the more sophisticated malware out there, however defeating them is pretty trivial.
sigmoid10
today at 4:16 PM
If they only collect the data for analysis I guess this is fine (they already get way more sensitive data from users anyways, so if privacy is your concern you've made the mistake many steps ago). The much more interesting question is if they directly act on this data in their API. For example by rate-limiting, compute-limiting or rerouting to weaker models. That might even be legally questionable. I would really like to see this as a follow-up analysis, but I guess it is way more difficult and will also cost quite a bit in tokens.
tgtweak
today at 4:38 PM
None of this is surprising - they're trying to mask and relay when they detect known patterns of what looks like distillation attacks and client app copying/modification. The list obfuscation here is likely to prevent or make it difficult for those same adversaries to work around this or delete/null it out when making a bootleg copy.
Cool reverse engineering/analysis report but if this is the extent of nefarious activity that came of it (trying to catch/mitigate chinese lab model distillations), that's kind of encouraging.
ryanisnan
today at 5:16 PM
This is weird but, help me understand how this meaningfully impacts our exposure.
I'm authenticated to Claude, so they already have the whole attribution thing solved.
throwawayffffas
today at 4:17 PM
Claude code does feel very malwarey to be honest. They have been like that from the start.
fny
today at 4:23 PM
This was already discovered during the source map leak.
> This is not a malicious feature, but it is a weird choice for a developer tool that asks for trust.
They already tell you they scan for malicious prompts, and they have no ZDR guarantees for consumers. Why do signatures like this matter at all?
port3000
today at 4:57 PM
That's a lot of effort when they could just play a short video saying 'You wouldn't steal a car' instead
100ms
today at 4:08 PM
What's the point of even trying to obfuscate this with such a simple method? Could at least have hidden the targeted features by storing their hashes or embedding a bloom filter or similar
chvid
today at 5:16 PM
(This sounds like a clumsy way of catching the Chinese that easily can be side-stepped.)
Claude Code has more or less full access to the client computer. The server (that hosts the actual AI) can just go: execute this payload and tell me the result - otherwise I won't answer any further questions or re-route you to a stupider model.
The payload could check for Chinese time-zones, scan for copies of the little red book on the local hard-drive, or ping truth.social to see it was behind the great firewall.
jacobgold
today at 5:04 PM
> "That also means the client itself deserves scrutiny. If a coding agent can read your repo and run commands, the binary that ships it should be boring (ƒor example, pi harness)"
You're actually trust your security to your harness AND model AND inference API provider in this scenario: https://jacob.gold/posts/why-i-wont-run-untrusted-models/
iqandjoke
today at 4:35 PM
It is about China detection. They seems to put a tracker on the email as well.
teravor
today at 6:13 PM
the Chinese they are trying to catch must be amateurs, first thing you should do is construct a sandbox which looks indistinguishable from a common user. second thing is to put it behind a residential proxy.
dehrmann
today at 5:06 PM
Anthropic must think that their moat isn't very large if they're this worried about distillation.
epistasis
today at 5:21 PM
After loving Claude Code for most of its lifetime, I've been extremely annoyed by every change in the past months, even on the model level.
There seem to be all sorts of continual under-the-cover changes like this one that make life harder. It feels like the entire product has been taken over by overly ambitious PMs that care more about making their mark than in improving the experience, and all of their marks have made me less productive.
I've been using Pi with GLM5.2 the past few days, and though it's expensive, I find it far more productive and less annoying. The remote session plugin is far more reliable, I don't need to intuit some undocumented usage pattern to figure out how to use it well, and it just works.
827a
today at 5:00 PM
This seems really, really stupid. Similar to the weird Zig runtime signature thing from a few months ago ago, it was bound to be discovered, quickly, and all the resellers have to do is find a new domain name that (checks notes) doesn't have the word DEEPSEEK in it. Like, seriously? Your goal was to identify resellers by checking if the proxy has the corporate name of one of your competitors in it? Is this amateur hour?
All Anthropic has done is reduce trust, once again, with legitimate customers, while doing nothing to stop illegitimate customers. They need to get adults into key leadership roles, quickly.
today at 5:34 PM
an0malous
today at 5:14 PM
Is this why Claude never knows what date and time it is right now?
TZubiri
today at 6:37 PM
based and steganopilled
Klonoar
today at 4:19 PM
If there weren't already enough tells that something is AI-generated, I guess you could add this to the list.
ahmedehab_01
today at 4:21 PM
Frankly, I don't see this as the concerning behaviour the article describes. It is fine to try to protect against distillation through a technique like this. This will also allow them to, instead of blocking the distillation agents, respond with a poorer result/model, hindering the progress of distillation, momentarily at least.
I would guess that's their first line of defense; they should have more techniques to identify distillation because that's a very simple way of detecting the host and can be easily spoofed.
MangoCoffee
today at 4:55 PM
The AI race right now is in a sad state. Chinese's playbook is releases open weight models and trains them on their own chips.
Anthropic pushes fear and control. But the only way to win is by innovating. China is flooding the market with cheap, good enough models, while the U.S. is building a Chinese firewall.
a_c
today at 4:35 PM
It piqued my interest. I think I’ve found a weekend project
ZappoMan
today at 6:02 PM
One more example of "I thought Anthropic was supposed to be the good guys."
hhh
today at 4:10 PM
Cool fingerprinting avenue.
SaaShack26
today at 5:03 PM
I use its too
mosfets
today at 4:53 PM
I clicked the link to learn what steganography mean...
ductsurprise
today at 4:41 PM
Is it just a minified localization(l10n) function maybe?
phendrenad2
today at 4:55 PM
Non-hugged: https://archive.is/Wdhp0
bitlad
today at 5:05 PM
Silicon valley season 6 was on point.
bibimsz
today at 5:36 PM
this is the one they wanted us to find
wolttam
today at 4:18 PM
I used Claude Code for a month because my boss gifted me a sub and wanted me to try it.
I used that month to complete a work project and then beef up my personal harness so I'd never have to deal with Anthropic (and these sorts of shenanigans) again.
ajross
today at 4:38 PM
Headline is, frankly, awful. This isn't the AI secretly doing stuff and hiding it. This is the very human Anthropic engineers trying to detect Chinese scraping via some frankly hamfisted and unimaginative URL trickery.
grayhatter
today at 4:14 PM
Here's the sha of the prompt I submitted... no I don't know why there are no saved prompts with that sha.
What do you mean you don't know where the bug is coming from?
No, I absolutely didn't make it up, how could you accuse me of that?
Does anyone know when this regex isn't working? I double checked it 27 times, I even asked the LLM. They all say this regex should be finding these dates.
Weird, suddenly all the conversations are breaking when I feed them into this other tool? Something about UTF-8 errors, but I'm sure I'm only using ASCII?
I do try to take care to make sure the things I build can be used by other people even when they care about different things. I care about understandably, determinism (as it relates to computing), and repeatability (because I want to be able to trust the systems I use).
If y'all would be willing to try to account for use cases of others, and try not to break them... that would be nice.
Please note: that generally when you modify something that belongs to someone else without telling them... things should be expected to break.
today at 4:26 PM
maxothex
today at 4:01 PM
[flagged]
123sereusername
today at 4:14 PM
[dead]
saddlerustle
today at 4:09 PM
[flagged]
midtake
today at 4:08 PM
[flagged]
atonse
today at 4:09 PM
[flagged]
theplumber
today at 4:06 PM
The more I learn about Anthropic the more they disgust me. Finger crossed for all the companies from their “ban list”
felipelalli
today at 4:59 PM
Ridiculous.
love0972
today at 4:16 PM
Is that really how it is? How will this affect our future?

Claude Code is steganographically marking requests

meowface

superfrank

overgard

arikrahman

meowface

mcmcmc

arcanemachiner

radicalbyte

m-hodges

ajyoon

hn_throwaway_99

meowface

thefourthchime

Philip-J-Fry

avree

Modified3019

jorblumesea

skywhopper

crossroadsguy

ncruces

lumost

skeptic_ai

chvid

writeslowly

yorwba

hn_throwaway_99

SepiaSapient

slopinthebag

VortexLain

loufe

dannyw

algoth1

zeafoamrun

nicce

bakugo

croemer

mrshadowgoose

civet_java

verdverm

Terr_

matheusmoreira

tgsovlerkhgsel

MattDamonSpace

pedropaulovc

eli

bakugo

femboyvtuber

realusername

mgraczyk

pishpash

bel8

Quinner

thepasch

edude03

civet_java

hnfong

sebastiennight

pmxi

pishpash

eli

skeptic_ai

sandeepkd

jgilias

eli

andrewmunsell

dannyw

andai

wett

nixosbestos

sebastiennight

MallocVoidstar

LPisGood

_alternator_

SubiculumCode

pishpash

charcircuit

_alternator_

charcircuit

_alternator_