Tiled Hacker news on React Router

Show HN: Badge that shows how well your codebase fits in an LLM's context window

76 points - today at 3:14 PM

Small codebases were always a good thing. With coding agents, there's now a huge advantage to having a codebase small enough that an agent can hold the full thing in context.

Repo Tokens is a GitHub Action that counts your codebase's size in tokens (using tiktoken) and updates a badge in your README. The badge color reflects what percentage of an LLM's context window the codebase fills: green for under 30%, yellow for 50-70%, red for 70%+. Context window size is configurable and defaults to 200k (size of Claude models).

It's a composite action. Installs tiktoken, runs ~60 lines of inline Python, takes about 10 seconds. The action updates the README but doesn't commit, so your workflow controls the git strategy.

The idea is to make token size a visible metric, like bundle size badges for JS libraries. Hopefully a small nudge to keep codebases lean and agent-friendly.

GitHub: https://github.com/qwibitai/nanoclaw/tree/main/repo-tokens

Source

layer8
today at 5:00 PM
Maybe it’s useful to dig out the concept of modularization with a distinction between interface and implementation again, and construct agents that are able to make effective use of it.
In the case that interfaces remain unchanged, agents only need to look at the implementation of a single module at a time plus the interfaces it consumes and implements. And when changing interfaces, agents only need to look at the interfaces of the modules concerned, and at most a limited number of implementation considerations.
It’s the very reason why we humans invented modularization: so that we don’t have to hold the complete codebase in our heads (“context windows”) in order to reason about it and make changes to it in a robust and well-grounded way.
nebezb
today at 3:46 PM
Useful and useless (or good and “less good”) aren’t easily mapped to big and small.
From a purely UX perspective, showing a red badge seems you’re conflating “less good” with size. Who is the target for this? Lots of useful codebases are large.
I do agree, however, that there’s value in splitting up domains into something a human can easily learn and keep in their head after, say, a few days of being deeply entrenched. Tokens could actually be a good proxy for this.
Doohickey-d
today at 5:47 PM
For at least some codebases, I'm not sure this is a useful metric. Because you don't usually put the whole codebase in your context at the same time.
For example in my current case, there are lots of files with CSS, SVG icons in separate files, old database migration scripts, etc. Those don't go in the LLM context 99% of the time.
Maybe a more useful metric would be "what percentage of files that have been edited in the last {n} days fit in the context"?
hennell
today at 5:31 PM
Outside of packages I doubt few of my code bases would fit into this. But the individual domain areas would. I don't care about users in a orders context, I don't care about payments when dealing with imports, no reason an ai should care either. It shouldn't care about implementations if there's an interface referenced, it shouldn't worry about front end when it's dealing with the back etc.
Scoping the Ai to only use the things you'd use seems far wiser than trying to reduce your codebase so it can look at the whole thing when 90% of it is irrelevant.
nicoburns
today at 7:12 PM
> Small codebases were always a good thing. With coding agents, there's now a huge advantage to having a codebase small enough that an agent can hold the full thing in context.
It is somewhat ironic that coding agents are notorious for generating much more code than necesary!
ramoz
today at 4:36 PM
I haven't cared too much about repo tokens in a good while.
But my coolest app was a better context creator. I found it hard to extend to actual agentic coding use. Agentic discovery is generally useful and reliable - the overhead of tokens can be managed by the harness (i.e. Claude Code).
https://prompttower.com/
bilekas
today at 5:34 PM
Im curious if there is a deep need for entire codebase to be consumed in the first place?
It would be better to have the architecture support a more decoupled/modular design if you're going to rely heavy on LLMs.
That or let it consume high quality maintained documentation?
joshmarlow
today at 5:57 PM
On a related note, this type of reasoning is what made me flip my opinion on microservices. I've generally been skeptical of a many-microservice architecture for the last decade but LLMs change that - a small microservice is more likely to fit in a context window.
I think this gestures at a more general point - we're still focusing on how to integrate LLMs into existing dev tooling paradigms. We squeeze LLMs into IDEs for human dev ergonomics but we should start thinking about LLM dev ergonomics - what idioms and design patterns make software development easiest for AIs?
t1amat
today at 6:41 PM
Interesting idea, but I think it might have made more sense to use something like repomix to generate the source bundle and tiktoken’d that. Practically speaking you don’t send many source files in raw text form, either they have some sort of file wrapper with metadata or are pulled in from a tool call where the tool call arguments act as the metadata.
collabs
today at 3:40 PM
This is an interesting concept. Thank you for sharing. I have an export.sh or export.ps1 script that takes the relevant files in my repository and puts them in a `dump.txt` file inside `docs/llm`.
I am not very good with AI though. Is there a quick and easy way to calculate token count and add this to my dump.txt file, ideally using just simple, included by default Linux tools in bash or simple, included by default Windows tools in powershell?
Thank you in advance.
b112
today at 4:01 PM
It's a fun, in the "style of the time" thing to track, but within a year or two, context window limitations won't be a thing.
Doubt me?
Think back 2 years. Now compare today. Change is at massive speed, and this issue is top line to be resolved in some fashion.
a13o
today at 7:31 PM
If you’re worried about fitting the window, make a RAG holding an AST transformation of your codebase
Retr0id
today at 3:46 PM
Some say that the ideal size of an individual function in a codebase is related to the amount of information you can hold in working memory. Maybe the ideal size for a library is the amount you can fit in an LLM context window?
Towaway69
today at 3:42 PM
What’s the going rate for tokens in terms of dollars? How much are companies spending on “tokens”?
Also kind of ironic that small codebases are now in vogue, just when google monolithic repos were so popular.
spicyusername
today at 4:04 PM
I'm not sure that smaller bases are always better.
jannniii
today at 3:42 PM
Interesting concept, but is it going to age well with context sizes of models are changing all the time (growing, mostly)?
KingOfCoders
today at 4:18 PM
Interesting, but not adding something to my CI for a badge, too paranoid.
agentica_ai
today at 3:29 PM
Smart idea. Token budgets are becoming the new line count metric for the LLM era.
marsven_422
today at 8:18 PM
[dead]
ai-christianson
today at 3:54 PM
[flagged]
hal9000xbot
today at 3:54 PM
[flagged]

Show HN: Badge that shows how well your codebase fits in an LLM's context window

layer8

sltr

bee_rider

GeoAtreides

layer8

Terretta

nebezb

iterateoften

adam_arthur

Doohickey-d

hennell

nicoburns

ramoz

bilekas

joshmarlow

f33d5173

layer8

SignalStackDev

t1amat

collabs

b112

written-beyond

arscan

spot5010

a13o

Retr0id

Towaway69

c0balt

Towaway69

xienze

Towaway69

spicyusername

unglaublich

jannniii

Retr0id

KingOfCoders

agentica_ai

irishcoffee

marsven_422

ai-christianson

kccqzy

daxfohl

ciaranmca

hal9000xbot