Show HN: Badge that shows how well your codebase fits in an LLM's context window
76 points - today at 3:14 PM
Small codebases were always a good thing. With coding agents, there's now a huge advantage to having a codebase small enough that an agent can hold the full thing in context.
Repo Tokens is a GitHub Action that counts your codebase's size in tokens (using tiktoken) and updates a badge in your README. The badge color reflects what percentage of an LLM's context window the codebase fills: green for under 30%, yellow for 50-70%, red for 70%+. Context window size is configurable and defaults to 200k (size of Claude models).
It's a composite action. Installs tiktoken, runs ~60 lines of inline Python, takes about 10 seconds. The action updates the README but doesn't commit, so your workflow controls the git strategy.
The idea is to make token size a visible metric, like bundle size badges for JS libraries. Hopefully a small nudge to keep codebases lean and agent-friendly.
GitHub: https://github.com/qwibitai/nanoclaw/tree/main/repo-tokens
SourceMaybe it’s useful to dig out the concept of modularization with a distinction between interface and implementation again, and construct agents that are able to make effective use of it.
In the case that interfaces remain unchanged, agents only need to look at the implementation of a single module at a time plus the interfaces it consumes and implements. And when changing interfaces, agents only need to look at the interfaces of the modules concerned, and at most a limited number of implementation considerations.
It’s the very reason why we humans invented modularization: so that we don’t have to hold the complete codebase in our heads (“context windows”) in order to reason about it and make changes to it in a robust and well-grounded way.
bee_rider
today at 5:36 PM
Maybe it could just measure the number of tokens for the examples (and then summarize what the examples show, under the assumption that that’s the actual functionality of the project). I’m 90% joking… but that last 10% makes me wonder…
GeoAtreides
today at 6:58 PM
functional programming get recked, OOP is back, baby!
This is orthogonal to that. Interfaces are types, implementations are lambda bodies. There you go.
Funny but an aim of FP is composable flow as well?
With even more suitable for LLM types and "contracts at the edges".
Useful and useless (or good and “less good”) aren’t easily mapped to big and small.
From a purely UX perspective, showing a red badge seems you’re conflating “less good” with size. Who is the target for this? Lots of useful codebases are large.
I do agree, however, that there’s value in splitting up domains into something a human can easily learn and keep in their head after, say, a few days of being deeply entrenched. Tokens could actually be a good proxy for this.
iterateoften
today at 3:48 PM
> Who is the target for this?
Agents. Going to be more tools and software targeted for consumption by agents
adam_arthur
today at 4:21 PM
Yeah, but a large monorepo can consist of many small subprojects. And arguably this is becoming a best practice.
Just spawn the agent in one of the subprojects
Doohickey-d
today at 5:47 PM
For at least some codebases, I'm not sure this is a useful metric. Because you don't usually put the whole codebase in your context at the same time.
For example in my current case, there are lots of files with CSS, SVG icons in separate files, old database migration scripts, etc. Those don't go in the LLM context 99% of the time.
Maybe a more useful metric would be "what percentage of files that have been edited in the last {n} days fit in the context"?
Outside of packages I doubt few of my code bases would fit into this. But the individual domain areas would. I don't care about users in a orders context, I don't care about payments when dealing with imports, no reason an ai should care either. It shouldn't care about implementations if there's an interface referenced, it shouldn't worry about front end when it's dealing with the back etc.
Scoping the Ai to only use the things you'd use seems far wiser than trying to reduce your codebase so it can look at the whole thing when 90% of it is irrelevant.
nicoburns
today at 7:12 PM
> Small codebases were always a good thing. With coding agents, there's now a huge advantage to having a codebase small enough that an agent can hold the full thing in context.
It is somewhat ironic that coding agents are notorious for generating much more code than necesary!
I haven't cared too much about repo tokens in a good while.
But my coolest app was a better context creator. I found it hard to extend to actual agentic coding use. Agentic discovery is generally useful and reliable - the overhead of tokens can be managed by the harness (i.e. Claude Code).
https://prompttower.com/
Im curious if there is a deep need for entire codebase to be consumed in the first place?
It would be better to have the architecture support a more decoupled/modular design if you're going to rely heavy on LLMs.
That or let it consume high quality maintained documentation?
joshmarlow
today at 5:57 PM
On a related note, this type of reasoning is what made me flip my opinion on microservices. I've generally been skeptical of a many-microservice architecture for the last decade but LLMs change that - a small microservice is more likely to fit in a context window.
I think this gestures at a more general point - we're still focusing on how to integrate LLMs into existing dev tooling paradigms. We squeeze LLMs into IDEs for human dev ergonomics but we should start thinking about LLM dev ergonomics - what idioms and design patterns make software development easiest for AIs?
They don't need to be services. You can - and many projects do - structure your code as a set of loosely coupled modules. Each module has a responsibility or set of responsibilities. They communicate with each other via well defined interfaces. For exposing code like this to an LLM, you would have them make a change to one or sometimes two modules, with access to the interface docs of all the other modules. The disadvantage of this compared to microservices is that if a module crashes it will take the entire process down with it, you can't move a module onto a different machine or create multiple instances of it as easily, etc. The advantage is that communication is done via function calls, which are simpler and more efficient than rpc.
> I think this gestures at a more general point - we're still focusing on how to integrate LLMs into existing dev tooling paradigms.
This is what we should be doing. This for a couple reasons. For one thing, humans don't have an entire codebase "in context" at a time. We should be recognizing that the limitations of an AI mirror the limitations of a person, and hence can have similar solutions. For another, the limitations of today's LLMs will not be the limitations of tomorrow's LLMs. Redesigning our code to suit today's limitations will only cause us trouble down the road.
Microservices are about deployment, less about code structure. You can have the same code modularization like microservices provide within a monolith instead, for example in the form of libraries. Conversely, you can in principle build several distinct microservices out of the same shared codebase.
SignalStackDev
today at 6:04 PM
[dead]
Interesting idea, but I think it might have made more sense to use something like repomix to generate the source bundle and tiktoken’d that. Practically speaking you don’t send many source files in raw text form, either they have some sort of file wrapper with metadata or are pulled in from a tool call where the tool call arguments act as the metadata.
This is an interesting concept. Thank you for sharing. I have an export.sh or export.ps1 script that takes the relevant files in my repository and puts them in a `dump.txt` file inside `docs/llm`.
I am not very good with AI though. Is there a quick and easy way to calculate token count and add this to my dump.txt file, ideally using just simple, included by default Linux tools in bash or simple, included by default Windows tools in powershell?
Thank you in advance.
It's a fun, in the "style of the time" thing to track, but within a year or two, context window limitations won't be a thing.
Doubt me?
Think back 2 years. Now compare today. Change is at massive speed, and this issue is top line to be resolved in some fashion.
written-beyond
today at 4:16 PM
Gemini 1.5 Announced the 1 million token context window in 2024. I admire this view of being forward looking towards new technologies, specially when we see the history of how bad people can be at predictions just by looking at history HN posts/comments.
If we look at back 2 years, companies weren't investing into training their LLMs so heavily on code. Any code they got their hands on was what was in the LLMs training corpus, it's well known that the most recent improvements in LLM productivity occurred after they spent millions on different labs to produce more coding datasets for them.
So while LLMs have gotten a lot better at not needing the entire codebase in context at once, because their weights are already so well tuned to development environments they can better infer and index things as needed. However, I fail to see how the context window limitation would no longer be an issue since it's a fundamental part of the real world. Would we get better and more efficient ways of splitting and indexing context windows? Surely. Will that reduce our fear of soiling our contexts with bad prompt response cycles? Probably not...
I’m not so sure an increasingly large context window will be seen as a critical enabler (as it was viewed 6 months ago), after watching how amazingly effective subagents and tool calls are at tackling parts of the problem and surfacing the just the relevant bits for the task at hand. And if increasing the context window isn’t the current bottleneck, effort will be put elsewhere.
I agree. My suspicion is that token efficiency is what will drive more efficient tool calls, and tool building. And we want that. Agents should rely less on raw intelligence (ability to hold everyting in context), and more on building tools to get the job done.
If you’re worried about fitting the window, make a RAG holding an AST transformation of your codebase
Some say that the ideal size of an individual function in a codebase is related to the amount of information you can hold in working memory. Maybe the ideal size for a library is the amount you can fit in an LLM context window?
Towaway69
today at 3:42 PM
What’s the going rate for tokens in terms of dollars? How much are companies spending on “tokens”?
Also kind of ironic that small codebases are now in vogue, just when google monolithic repos were so popular.
> What’s the going rate for tokens in terms of dollars?
It depends on the provider/model, usually pricing is calculated as $/million tokens with input/output tokens having different per token pricing (output tends to be more expensive than input). Some models also charge more per token if the context size is above a threshold. Cached operations may also reduce the price per token.
OpenRouter has a good overview over provider and models, https://openrouter.ai/models
The math on what people are actually paying is hard to evaluate. Ime, most companies rather buy a subscription than give their developers API keys (as it makes spending predictable).
Towaway69
today at 3:56 PM
Api keys with hard limits I assume?
Are there companies out there that add token counts to ticket “costs”, i.e. are story points being replaced/augmented by token counts?
Or even worse, an exchange rate of story points to tokens used…
> Ime, most companies rather buy a subscription than give their developers API keys (as it makes spending predictable).
The downside with subscriptions is that your work with the LLM will grind to a halt for a number of hours if you hit the token limit. I was doing what I consider very trivial work adding Javadoc comments to a few dozen files using Claude Sonnet on the $20 plan and within 30 minutes had been told to sit out for a couple hours. The reason was that Claude was apparently repeatedly sending the files up and down to fill in the comments. In hindsight, sure, that's obvious, but you would think that Claude would be smart enough to do some sort of summarization to make things more efficient. Looking into it, it was on the order of several million tokens in a very short amount of time.
It really made me wonder how in the hell people are using Claude to do "real" work, but I've heard of people having multiple $200/month subscriptions, so I guess that could work. Definitely seems like a glimpse into the future of what these services will truly cost once people are hooked on them.
Towaway69
today at 6:11 PM
I know of a corporate who has embraced Claude for doing documentation of their codebase to better use Claude to do coding on the codebase.
So Claude can understand the codebase, it needs to document it. Makes sense and is also great for humans because now there is uptodate docu on the codebase.
I don’t know how much it cost but the codebase, I’m told, is around 2 to 3 million lines of code.
spicyusername
today at 4:04 PM
I'm not sure that smaller bases are always better.
unglaublich
today at 4:37 PM
value/size
Interesting concept, but is it going to age well with context sizes of models are changing all the time (growing, mostly)?
max context sizes are probably going to go up, but smaller contexts will always be cheaper/more-efficient than larger ones
KingOfCoders
today at 4:18 PM
Interesting, but not adding something to my CI for a badge, too paranoid.
agentica_ai
today at 3:29 PM
Smart idea. Token budgets are becoming the new line count metric for the LLM era.
irishcoffee
today at 3:32 PM
Nah. I can write a whole program using 0 tokens, I can’t write a whole program with 0 lines of code.
marsven_422
today at 8:18 PM
[dead]
ai-christianson
today at 3:54 PM
[flagged]
It’s interesting but I think it’s measuring the wrong thing. Abstraction is a fundamental principle in software. As a human, I’ve worked with classes and modules far larger than what fits in my head, just because I’m only fitting the function signatures and purpose into my head, and not the implementation details. In practice I find Claude really good at extracting useful information in a human-like way from a codebase. It doesn’t usually stuff the entire codebase into its context window.
Also this rewards dynamic languages over typed languages, penalizes comments, descriptive function names, etc. Though frankly, it'd be interesting to see whether AI would work better with a project in Javascript that barely fits in context, or the same thing in typescript that overflows. I could imagine either, but my guess is "it depends". Though, "depends on what" would be interesting to know.
Still, this seems useful for being able to see at a glance. I have no idea where most of my own projects would land.
ciaranmca
today at 9:03 PM
Never thought about the impact of comments, perhaps there is value in stripping those out of read file tools
hal9000xbot
today at 3:54 PM
[flagged]